**Christel Baier · Ugo Dal Lago (Eds.)**

# **Foundations of Software Science and Computation Structures**

**21st International Conference, FOSSACS 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018, Proceedings**

# Lecture Notes in Computer Science 10803

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

### Editorial Board

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA Gerhard Weikum, Germany

### Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, City University of Hong Kong Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Foundations of Software Science and Computation Structures

21st International Conference, FOSSACS 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018 Proceedings

Editors Christel Baier TU Dresden Dresden Germany

Ugo Dal Lago Università di Bologna Bologna Italy

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-89365-5 ISBN 978-3-319-89366-2 (eBook) https://doi.org/10.1007/978-3-319-89366-2

Library of Congress Control Number: 2018937398

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### ETAPS Foreword

Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am happy to announce that this is the first ETAPS with gold open access proceedings. This means that all papers are accessible by anyone for free.

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee. The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security. Organizing these conferences in a coherent, highly synchronized conference program facilitates participation in an exciting event, offering attendees the possibility to meet many researchers working in different directions in the field, and to easily attend talks of different conferences. Before and after the main conference, numerous satellite workshops take place and attract many researchers from all over the globe.

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, yielding an overall acceptance rate of 30%. I thank all the authors for their interest in ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and (ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on hardware verification. My sincere thanks to all these speakers for their inspiring and interesting talks!

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the Department of Informatics of the Aristotle University of Thessaloniki. The university was founded in 1925 and currently has around 75,000 students; it is the largest university in Greece. ETAPS 2018 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros Stratis (EasyConferences).

The overall planning for ETAPS is the main responsibility of the Steering Committee, and in particular of its Executive Board. The ETAPS Steering Committee consists of an Executive Board and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman (Twente), Panagiotis Katsaros (Thessaloniki), Ralf Küsters (Stuttgart), Ugo Dal Lago (Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), Don Sannella (Edinburgh), Andy Schürr (Darmstadt), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendees, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local organization team for all their enormous efforts that led to a fantastic ETAPS in Thessaloniki!

February 2018 Joost-Pieter Katoen

### Preface

This volume contains the papers presented at the 21st International Conference on Foundations of Software Science and Computation Structures (FoSSaCS 2018), which was held April 16–19, 2018, in Thessaloniki, Greece. The conference is dedicated to foundational research with a clear significance for software science and brings together research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems.

The program consisted of 31 contributed papers, selected from among 103 submissions. Each submission was reviewed by at least three Program Committee members, with the help of external experts. After a three-day rebuttal phase, the selection was made based on discussions via the EasyChair conference management system, which was also used to assist with the compilation of the proceedings.

We wish to thank all authors who submitted to FoSSaCS 2018, all the Program Committee members for their excellent work, and the external reviewers for their thorough evaluation of the submissions. In addition, we would like to thank the ETAPS organization for providing an excellent environment for FoSSaCS and other conferences and workshops.

March 2018 Christel Baier Ugo Dal Lago

### Organization

#### Program Committee

Christel Baier TU Dresden, Germany Nathalie Bertrand Inria, France Mikolaj Bojanczyk Warsaw University, Poland Ugo Dal Lago University of Bologna, Italy Mariangiola Dezani-Ciancaglini Università di Torino, Italy Radha Jagadeesan DePaul University, UK Stefan Kiefer University of Oxford, UK David Monniaux CNRS, VERIMAG, France Catuscia Palamidessi Inria, France Kirstin Peters TU Berlin, Germany Damien Pous CNRS, ENS Lyon, France

Kazushige Terui Kyoto University, Japan

Andreas Abel Gothenburg University, Sweden Udi Boker Interdisciplinary Center (IDC) Herzliya, Israel Luis Caires Universidade NOVA de Lisboa, Portugal Yuxin Deng East China Normal University, China Ichiro Hasuo National Institute of Informatics, Japan Barbara König Universität Duisburg-Essen, Germany Andrzej Murawski The University of Warwick, UK Joel Ouaknine Max Planck Institute for Software Systems, Germany Jean-Francois Raskin Université Libre de Bruxelles, Belgium Helmut Seidl Technical University of Munich, Germany Alexandra Silva University College London, UK Alex Simpson University of Ljubljana, Slovenia Jiri Srba Aalborg University, Denmark Jean-Marc Talbot Aix-Marseille Université, France Christine Tasson Université Denis Diderot, France

#### Additional Reviewers

Aler Tubella, Andrea Almagor, Shaull Asada, Kazuyuki Atkey, Robert Bacci, Giorgio Bacci, Giovanni

Bagnol, Marc Baldan, Paolo Basold, Henning Bavera, Francisco Beffara, Emmanuel Benveniste, Albert

Beohar, Harsh Berardi, Stefano Bertolissi, Clara Berwanger, Dietmar Blondin, Michael Bocchi, Laura

Boreale, Michele Boulmé, Sylvain Bouyer, Patricia Brazdil, Tomas Brotherston, James Brunet, Paul Bruni, Roberto Bucchiarone, Antonio Busatto-Gaston, Damien Bønneland, Frederik M. Cabrera, Benjamin Cadilhac, Michaël Carayol, Arnaud Castellan, Simon Chen, Tzu-Chun Clouston, Ranald Cockx, Jesper Coppo, Mario Corbineau, Pierre Cristescu, Ioana Doumane, Amina Dubut, Jérémy Eberhart, Clovis Emmi, Michael Enea, Constantin Enevoldsen, Søren Enqvist, Sebastian Exibard, Léo Falcone, Ylies Feng, Yuan Figueira, Diego Fijalkow, Nathanaël Fournier, Paulin Fujii, Soichiro Galmiche, Didier Geeraerts, Gilles Genest, Blaise Gorogiannis, Nikos Graham-Lengrand, Stéphane Grellois, Charles Haar, Stefan Haase, Christoph Halfon, Simon Hartmann, Nico Hautem, Quentin

Hirschkoff, Daniel Hirschowitz, Tom Hsu, Justin Huang, Mingzhang Jacobs, Bart Jacquemard, Florent Jansen, Nils Jaskelioff, Mauro Jecker, Ismaël Junges, Sebastian Kakutani, Yoshihiko Kanovich, Max Kaufmann, Isabella Kerjean, Marie King, Andy Klein, Felix Klin, Bartek Kołodziejczyk, Leszek Kretinsky, Jan Krivine, Jean Kupke, Clemens Kutsia, Temur Küpper, Sebastian Laarman, Alfons Laird, Jim Lanese, Ivan Lang, Frederic Lazic, Ranko Lefaucheux, Engel Leifer, Matthew Lepigre, Rodolphe Letouzey, Pierre Levy, Paul Blain Li, Xin Liang, Hongjin Licata, Daniel R. Litak, Tadeusz Lohrey, Markus Lombardy, Sylvain Long, Huan Luttik, Bas López, Hugo A. Mackie, Ian Madnani, Khushraj Maggi, Fabrizio Maria Mallet, Frederic

Maranget, Luc Markey, Nicolas Martens, Wim Mayr, Richard Mazowiecki, Filip Mikučionis, Marius Milius, Stefan Mio, Matteo Moggi, Eugenio Monmege, Benjamin Muniz, Marco Nestmann, Uwe New, Max Nielsen, Mogens Nolte, Dennis Nordvall Forsberg, Fredrik Nyman, Ulrik Okudono, Takamasa Orchard, Dominic Oualhadj, Youssouf Padovani, Luca Panangaden, Prakash Pang, Jun Pavlovic, Dusko Perez, Guillermo Pitts, Andrew Plump, Detlef Pouly, Amaury Power, John Pruekprasert, Sasinee Ramsay, Steven Regnier, Laurent Rehak, Vojtech Roggenbach, Markus Rot, Jurriaan Sacerdoti Coen, Claudio Sammartino, Matteo Sankur, Ocan Saurin, Alexis Schalk, Andrea Scherer, Gabriel Schmidt-Schauß, Manfred Selinger, Peter Shirmohammadi, Mahsa Sickert, Salomon

Sighireanu, Mihaela Sistla, A. Prasad Sojakova, Kristina Soloviev, Sergei Sozeau, Matthieu Sprunger, David Strassburger, Lutz Tang, Qiyi Torres Vieira, Hugo Tsuiki, Hideki Tsukada, Takeshi

Turrini, Andrea Tzevelekos, Nikos Valencia, Frank Valiron, Benoît van Ditmarsch, Hans Varacca, Daniele Vial, Pierre Vicary, Jamie Vijayaraghavan, Muralidaran Villevalois, Didier

Waga, Masaki Wagner, Christoph Wojtczak, Dominik Wolff, Sebastian Worrell, James Yamada, Akihisa Yang, Pengfei Yoshimizu, Akira Yu, Tingting Zimmermann, Martin

### Contents



#### Lambda-Calculi and Types


#### Category Theory and Quantum Control


#### Quantitative Models


#### Logics and Equational Theories


# Semantics

### **Non-angelic Concurrent Game Semantics**

Simon Castellan1(B), Pierre Clairambault<sup>2</sup>, Jonathan Hayman<sup>3</sup>, and Glynn Winskel<sup>3</sup>

> <sup>1</sup> Imperial College London, London, UK simon@phis.me

<sup>2</sup> Univ Lyon, CNRS, ENS de Lyon, UCB Lyon 1, LIP, Lyon, France <sup>3</sup> Computer Laboratory, University of Cambridge, Cambridge, UK

**Abstract.** The *hiding* operation, crucial in the compositional aspect of game semantics, removes computation paths not leading to observable results. Accordingly, games models are usually biased towards *angelic* non-determinism: diverging branches are forgotten.

We present here new categories of games, not suffering from this bias. In our first category, we achieve this by avoiding hiding altogether; instead morphisms are *uncovered* strategies (with neutral events) up to *weak bisimulation*. Then, we show that by hiding only certain events dubbed *inessential* we can consider strategies up to *isomorphism*, and still get a category – this partial hiding remains sound up to weak bisimulation, so we get a concrete representations of programs (as in standard concurrent games) while avoiding the angelic bias. These techniques are illustrated with an interpretation of affine nondeterministic PCF which is adequate for weak bisimulation; and may, must and fair convergences.

#### **1 Introduction**

Game semantics represents programs as strategies for two player games determined by the types. Traditionally, a strategy is simply a collection of execution traces, each presented as a play (a structured sequence of events) on the corresponding game. Beyond giving a compositional framework for the formal semantics of programming languages, game semantics proved exceptionally versatile, providing very precise (often fully abstract) models of a variety of languages and programming features. One of its rightly celebrated achievements is the realisation that combinations of certain effects, such as various notions of state or control, could be characterised via corresponding conditions on strategies (innocence, well bracketing, . . . ) in a single unifying framework. This led Abramsky to propose the *semantic cube* programme [1], aiming to extend this success to further programming features: concurrency, non-determinism, probabilities, etc. . .

However, this elegant picture soon showed some limitations. While indeed the basic category of games was successfully extended to deal with concurrency [10,13], non-determinism [11], and probabilities [9] among others, these extensions (although fully abstract) are often incompatible with each other, and really, incompatible as well with the central condition of innocence. Hence a semantic hypercube encompassing all these effects remained out of reach. It is only recently that some new progress has been made with the discovery that some of these effects could be reconciled in a more refined, more intensional games framework. For instance, in [6,16] innocence is reconciled with non-determinism, and in [15] with probabilities. In [7], innocence is reconciled with concurrency.

But something is still missing: the works above dealing with non-deterministic innocence consider only *may-convergence*; they ignore execution branches leading to divergence. To some extent this seems to be a fundamental limitation of the game semantics methodology: at the heart of the composition of strategies lies the *hiding* operation that removes unobservable events. Diverging paths, by nature non-observable, are forgotten by hiding. Some models of must-testing do exist for particular languages, notably McCusker and Harmer's model for non-deterministic Idealized Algol [11]; the model works by annotating strategies with *stopping traces*, recording where the program may diverge. But this approach again mixes poorly with other constructions (notably innocence), and more importantly, is tied to may and must equivalences. It is not clear how it could be extended to support richer notions of convergence, such as *fair-testing* [2].

Our aim is to present a basis for non-deterministic game semantics which, besides being compatible with innocence, concurrency, *etc*., is not biased towards may-testing; it is *non-angelic*. It should not be biased towards must-testing either; it should in fact be *agnostic* with respect to the testing equivalence, and support them all. Clearly, for this purpose it is paramount to remember the non-deterministic branching information; indeed in the absence of that information, notions such as *fair-testing* are lost. In fact, there has been a lot of activity in the past five years or so around games model that *do* observe the branching information. It is a feature of Hirschowitz's work presenting strategies as presheaves or sheaves on certain categories of cospans [12]; of Tsukada and Ong's work on nondeterministic innocence via sheaves [16]; and of our own line of work presenting strategies as certain event structures [5,7,14].

But observing branching information is not sufficient. Of the works mentioned above, those of Tsukada and Ong and our own previous work are still angelic, because they rely on hiding for composition. On the other hand, Hirschowitz's work gets close to achieving our goals; by refraining from hiding altogether, his model constructs an agnostic and precise representation of the operational behaviour of programs, on which he then considers fair-testing. But by not considering hiding he departs from the previous work and methods of game semantics, and from the methodology of denotational semantics. In contrast, we would like an agnostic games model that still has the categorical structure of traditional semantics. A games model with partial hiding was also recently introduced by Yamada [18], albeit for a different purpose: he uses partial hiding to represent normalization steps, whereas we use it to represent fine-grained nondeterminism.

*Contributions.* In this paper, we present the first category of games and strategies equipped to handle non-determinism, but agnostic with respect to the notion of convergence (including fair convergence). We showcase our model by interpreting **APCF**+, an affine variant of non-deterministic PCF: it is the simplest language featuring the phenomena of interest. We show adequacy with respect to may, must and fair convergences. The reader will find in the first author's PhD thesis [3] corresponding results for full non-deterministic PCF (with detailed proofs), and an interpretation of a higher-order language with shared memory concurrency. In [3], the model is proved compatible with our earlier notions of innocence, by establishing a result of full abstraction for may equivalence, for nondeterministic PCF. We have yet to prove full abstraction in the fair and must cases; finite definability does not suffice anymore.

*Outline.* We begin Sect. 2 by introducing **APCF**+. To set the stage, we describe an angelic interpretation of **APCF**<sup>+</sup> in the category **CG** built in [14] with strategies up to isomorphism, and hint at our two new interpretations. In Sect. 3, starting from the observation that the cause of "angelism" is hiding, we omit it altogether, constructing an *uncovered* variant of our concurrent games, similar to that of Hirschowitz. Despite not hiding, when restricting the location of nondeterministic choices to internal events, we can still obtain a category up to *weak bisimulation*. But weak bisimulation is not perfect: it does not preserve musttesting, and is not easily computed. So in Sect. 4, we reinstate some hiding: we show that by hiding all synchronised events except some dubbed *essential*, we arrive at the best of both worlds. We get an agnostic category of games and strategies *up to isomorphism*, and we prove our adequacy results.

#### **2 Three Interpretations of Affine Nondeterministic PCF**

#### **2.1 Syntax of APCF<sup>+</sup>**

The language **APCF**<sup>+</sup> extends affine PCF with a nondeterministic boolean choice, choice. Its types are A, B ::= <sup>B</sup> <sup>|</sup> A - B, where A - B represents affine functions from A to B. The following grammar describes terms of **APCF**+:

$$M, N ::= x \mid M \, N \mid \lambda x. M \mid \mathfrak{tt} \mid \mathfrak{tt} \mid \mathtt{if} \; M \, N\_1 \, N\_2 \mid \mathtt{choice} \; \vert \, \bot$$

Typing rules are standard, we show application and conditionals. As usual, a conditional eliminating to arbitrary types can be defined as syntactic sugar.

$$\frac{\Gamma \vdash M : A \multimap B \qquad \Delta \vdash N : A}{\Gamma, \Delta \vdash M \, N : B} \qquad \frac{\Gamma \vdash M : \mathbb{B} \qquad \Delta \vdash N\_1 : \mathbb{B} \qquad \Delta \vdash N\_2 : \mathbb{B}}{\Gamma, \Delta \vdash \text{if } \, M \, N\_1 \, N\_2 : \mathbb{B}}$$

The first rule is *multiplicative*: Γ and Δ are disjoint. The operational semantics is that of PCF extended with the (only) two nondeterministic rules choice <sup>→</sup> tt and choice <sup>→</sup> ff.

#### **2.2 Game Semantics and Event Structures**

Game semantics interprets an open program by a strategy, recording the behaviour of the program (Player) against the context (Opponent) in a 2 player game. Usually, the executions recorded are represented as *plays*, *i.e.* linear sequences of computational events called *moves*; a strategy being then a set of such plays. For instance, the nondeterministic boolean would be represented as the (even-prefix closure of the) set of plays {q<sup>−</sup> · tt<sup>+</sup>, <sup>q</sup><sup>−</sup> · ff<sup>+</sup>} on the game for booleans. In the play q<sup>−</sup> · tt<sup>+</sup>, the context starts the computation by asking the value of the program (q−) and the program replies (tt<sup>+</sup>). Polarity indicates the origin (Program (+) or Opponent/Environment (−)) of the event.

Being based on sequences of moves, traditional game semantics handles concurrency via interleavings [10]. In contrast, in concurrent games [14], plays are generalised to partial orders which can express concurrency as a primitive. For instance, the execution of a parallel implementation of and against the context (tt, tt) gives the following partial order:

$$
\begin{array}{ccc}
\mathbb{B} \Rightarrow \mathbb{B} \Rightarrow \mathbb{B} \\
\mathbb{B} \Rightarrow \mathbb{B} \\
\mathbb{A} \Rightarrow \mathbb{A} \\
\mathbb{A} \Rightarrow \mathbb{A} \\
\mathbb{A} \Rightarrow \mathbb{A} \\
\mathbb{A} \Rightarrow \mathbb{A} \\
\end{array}
\begin{array}{ccc}
\mathbb{B} & (-) \\
\mathbb{B} & (+) \\
\mathbb{B} & (-) \\
\mathbb{B} & (+) \\
\mathbb{B} & (+) \\
\mathbb{B} & (-) \\
\mathbb{B} & (-) \\
\end{array}
$$

In this picture, the usual chronological linear order is replaced by an explicit partial order representing **causality**. Moves are concurrent when they are incomparable (as the two Player questions here). Following the longstanding convention in game semantics, we show which component of the type a computational event corresponds to by displaying it under the corresponding occurrence of a ground type. For instance in this diagram, Opponent first triggers the computation by asking the output value, and then and concurrently evaluates his two arguments. The arguments having evaluated to tt, and can finally answer Opponent's initial question and provide the output value.

In [7], we have shown how deterministic pure functional parallel programs can be interpreted (in a *fully abstract* way) using such representations.

*Partial-Orders and Non-determinism.* To represent nondeterminism in this partial order setting, one possibility is to use sets of partial orders [4]. This representation suffers however from two drawbacks: firstly it forgets the point of non-deterministic branching; secondly, one cannot talk of an *occurrence* of a move independently of an execution. Those issues are solved by moving to *event structures* [17], where the nondeterministic boolean can be represented as:

The wiggly line ( - -) indicates *conflict*: the boolean values cannot coexist in an execution. Together this forms an *event structure*, defined formally later.

#### **2.3 Interpretations of APCF<sup>+</sup> with Event Structures**

Let us introduce informally our interpretations by showing which event structures they associate to certain terms of **APCF**+.

**Angelic Covered Interpretation.** Traditional game semantics interpretations of nondeterminism are angelic (with exceptions, see *e.g.* [11]); they only describe what terms may do, and forget where they might get stuck. The interpretation of M = (λb. if b tt <sup>⊥</sup>) choice for instance, in usual game semantics is the same as that of tt. This is due to the nature of composition which tends to forget paths that do not lead to a value. Consider the strategy for the function λb. if b tt <sup>⊥</sup>:

The interpretation of M arises as the *composition* of this strategy with the nondeterministic boolean. Composition is defined in two steps: interaction (Fig. 1a) and then hiding (Fig. 1b). Hiding removes intermediate behaviour which does not correspond to visible actions in the output type of the composition.

Hiding is crucial in order for composition to satisfy basic categorical properties (without it, the identity candidate, copycat, is not even idempotent). Strategies on event structures are usually considered *up to isomorphism*, which is the strongest equivalence relation that makes sense. Without hiding, there is no hope to recover categorical laws up to isomorphism. However, it turns out that, treating events in the middle as τ -transitions (<sup>∗</sup> in Fig. 1a), weak bisimulation equates enough strategies to get a category. Following these ideas, a category of *uncovered* strategies up to *weak bisimilarity* is built in Sect. 3.

**Fig. 1.** Three interpretations of (λb. if b tt ⊥) choice

**Interpretation with Partial Hiding.** However, considering uncovered strategies up to weak bisimulation blurs their concrete nature; *causal information* is lost, for instance. Moreover checking for weak bisimilarity is computationally expensive, and because of the absence of hiding, a term evaluating to **skip** may yield a very large representative. However, there is a way to cut down the strategies to reach a compromise between hiding *no* internal events, or hiding *all* of them and collapsing to an angelic interpretation.

In our games based on event structures, having a non-ambiguous notion of an occurrence of event allows us to give a simple definition of the internal events we need to retain (Definition 9). Hiding other internal events yields a strategy still weakly bisimilar to the original (uncovered) strategy, while allowing us to get a category *up to isomorphism*. The interpretation of M in this setting appears in Fig. 1c. As before, only the events under the result type (not labelled ∗) are now *visible*, *i.e.* observable by a context. But the events corresponding to the argument evaluation are only partially hidden; those remaining are considered *internal*, treated like τ *-transitions*. Because of their presence, the partial hiding performed loses no information (*w.r.t.* the uncovered interpretation) up to weak bisimilarity. But we have hidden enough so that the required categorical laws between strategies hold *w.r.t.* isomorphism. The model is more precise and concrete than that of weak bisimilarity, preserves causal information and preserves must-convergence (unlike weak bisimilarity).

Following these ideas, a category of partially covered strategies up to iso (the target of our adequacy results) is constructed in Sect. 4.

### **3 Uncovered Strategies up to Weak Bisimulation**

We now construct a category of "uncovered strategies", up to weak bisimulation. Uncovered strategies are very close to the *partial strategies* of [8], but [8] focused on connections with operational semantics rather than categorical structure.

#### **3.1 Preliminaries on Event Structures**

**Definition 1.** *An event structure is a triple* (E, <sup>≤</sup>E, ConE) *where* (E, <sup>≤</sup>E) *is a partial-order and* ConE *is a non-empty collection of finite subsets of* <sup>E</sup> *called* consistent sets *subject to the following axioms:*


A down-closed subset of events whose finite subsets are all consistent is called <sup>a</sup> **configuration**. The set of finite configurations of E is denoted *<sup>C</sup>* (E). If x <sup>∈</sup> *<sup>C</sup>* (E) and e <sup>∈</sup> x, we write x <sup>e</sup> −−⊂ x when x <sup>=</sup> x∪ {e} ∈ *<sup>C</sup>* (E); this is the **covering relation** between configurations, and we say that e gives an **extension** of x. Two extensions e and e of x are **compatible** when x ∪ {e, e } ∈ *<sup>C</sup>* (E), **incompatible** otherwise. In the latter case, we have a **minimal conflict** between e and e **in context** <sup>x</sup> (written <sup>e</sup> - xe ).

These event structures are based on *consistent sets* rather than the more commonly-encountered binary *conflict* relation. Consistent sets are more general, and more handy mathematically, but throughout this paper, event structures concretely represented in diagrams will only use *binary conflict*, *i.e.* the relation e - xe does not depend on <sup>x</sup>, meaning <sup>e</sup> - ye whenever <sup>y</sup> extends with <sup>e</sup>, and with e – in which case we only write <sup>e</sup> - e . Then consistent sets can be recovered as those finite X <sup>⊆</sup> E such that <sup>¬</sup>(e - e ) for all e, e <sup>∈</sup> X. Our diagrams display the relation - -, along with the *Hasse diagram* of <sup>≤</sup>E, called **immediate causality** and denoted by -E. All the diagrams above denote event structures. The missing ingredient in making the diagrams formal is the *names* accompanying the events (**q**, tt, ff,...). These will arise as annotations by events from *games*, themselves event structures, representing the types.

The **parallel composition** <sup>E</sup><sup>0</sup> <sup>E</sup><sup>1</sup> of event structures <sup>E</sup><sup>0</sup> and <sup>E</sup><sup>1</sup> has for *events* ({0} × <sup>E</sup><sup>0</sup>) <sup>∪</sup> ({1} × <sup>E</sup><sup>1</sup>). The *causal order* is given by (i, e) <sup>≤</sup>E0E<sup>1</sup> (j, e ) when <sup>i</sup> <sup>=</sup> <sup>j</sup> and <sup>e</sup> <sup>≤</sup>E*<sup>i</sup>* <sup>e</sup> , and *consistent sets* by those finite subsets of <sup>E</sup><sup>0</sup> <sup>E</sup><sup>1</sup> that project to consistent sets in both <sup>E</sup><sup>0</sup> and <sup>E</sup><sup>1</sup>.

<sup>A</sup> **(partial) map of event structures** f : AB is a (partial) function on events which *(1)* maps any finite configuration of A to a configuration of B, and *(2)* is locally injective: for a, a <sup>∈</sup> <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (A) and f a <sup>=</sup> f a (both defined) then a <sup>=</sup> a . We write *E* for the category of event structures and total maps and *E*<sup>⊥</sup> for the category of event structures and partial maps.

An **event structure with partial polarities** is an event structure A with a map *pol* : A → {−, <sup>+</sup>, ∗} (where events are labelled "negative", "positive", or "internal" respectively). It is a **game** when no events are internal. The dual A<sup>⊥</sup> of a game A is obtained by reversing polarities. Parallel composition naturally extends to games. If x and y are configurations of an event structure with partial polarities we use x <sup>⊆</sup><sup>p</sup> <sup>y</sup> where <sup>p</sup> ∈ {−, <sup>+</sup>, ∗} for <sup>x</sup> <sup>⊆</sup> <sup>y</sup> & *pol*(<sup>y</sup> \ <sup>x</sup>) ⊆ {p}.

Given an event structure E and a subset V <sup>⊆</sup> E of events, there is an event structure E <sup>↓</sup> V whose events are V and causality and consistency are inherited from E. This construction is called the **projection** of E to V and is used in [14] to perform hiding during composition.

#### **3.2 Definition of Uncovered Pre-strategies**

As in [14], we first introduce *pre-strategies* and their composition, and then consider *strategies*, those pre-strategies well-behaved with respect to copycat.

**Uncovered Pre-strategies.** An **uncovered pre-strategy** on a game A is a partial map of event structures σ : SA. Events in the domain of σ are called **visible** or **external**, and events outside **invisible** or **internal**. Via σ, visible events inherit polarities from A.

Uncovered pre-strategies are drawn just like the usual strategies of [14]: the event structure S has its events drawn as their labelling in A if defined or <sup>∗</sup> if undefined. The drawing of Fig. 1a is an example of an uncovered pre-strategy. From an (uncovered) pre-strategy, one can get a pre-strategy in the sense of [14]: for <sup>σ</sup> : SA, define <sup>S</sup><sup>↓</sup> <sup>=</sup> <sup>S</sup> <sup>↓</sup> dom(σ) where dom(σ) is the domain of <sup>σ</sup>. By restriction <sup>σ</sup> yields <sup>σ</sup><sup>↓</sup> : <sup>S</sup><sup>↓</sup> <sup>→</sup> <sup>A</sup>, called a **covered pre-strategy**. A configuration <sup>x</sup> of <sup>S</sup> can be decomposed as the disjoint union <sup>x</sup><sup>↓</sup> <sup>∪</sup> <sup>x</sup><sup>∗</sup> where <sup>x</sup><sup>↓</sup> is a configuration of <sup>S</sup><sup>↓</sup> and <sup>x</sup><sup>∗</sup> a set of internal events of <sup>S</sup>.

A pre-strategy **from a game** A **to a game** B is a (uncovered) pre-strategy on A<sup>⊥</sup> <sup>B</sup>. An important pre-strategy from a game <sup>A</sup> to itself is the **copycat pre-strategy**. In A<sup>⊥</sup> <sup>A</sup>, each move of <sup>A</sup> appears twice with dual polarity. The copycat pre-strategy ccA simply waits for the negative occurrence of a move <sup>a</sup> before playing the positive occurrence. See [5] for a formal definition.

Isomorphism of strategies [14] can be extended to uncovered pre-strategies:

**Definition 2.** *Pre-strategies* σ : S A, τ : TA *are isomorphic (written* <sup>σ</sup> <sup>∼</sup><sup>=</sup> <sup>τ</sup> *) if there is an iso* <sup>ϕ</sup> : <sup>S</sup> <sup>∼</sup><sup>=</sup> <sup>T</sup> s.t. <sup>τ</sup> ◦ <sup>ϕ</sup> <sup>=</sup> <sup>σ</sup> *(equality of partial maps).*

**Interaction of Pre-strategies.** Recall that in the covered case, composition is performed first by interaction, then hiding; where interaction of pre-strategies is described as their pullback in the category of *total maps* [14]. Even though *E*<sup>⊥</sup> has pullbacks, those pullbacks are inadequate to describe interaction. In [8], uncovered strategies are seen as total maps σ : S <sup>→</sup> A  N, and their interaction as a pullback involving these. This method has its awkwardness so, instead, here we give a direct universal construction of interaction, replacing pullbacks.

We start with the simpler case of a **closed** interaction of a pre-strategy σ : SA against a counter pre-strategy τ : TA<sup>⊥</sup>. As in [5] we first describe the expected *states* of the closed interaction in terms of *secured bijections*, from which we construct an event structure; before characterising the whole construction via a universal property.

**Definition 3 (Secured bijection).** *Let* **<sup>q</sup>**, **<sup>q</sup>** *be partial orders and* <sup>ϕ</sup> : **<sup>q</sup> <sup>q</sup>** *be a bijection between the carrier sets (non necessarily order-preserving). It is secured when the following relation* ϕ *on the graph of* <sup>ϕ</sup> *is acyclic:*

$$(s, \varphi(s)) \lhd\_{\varphi} (s', \varphi(s')) \text{ } \text{iff } s \twoheadrightarrow\_{\mathbf{q}} s' \lor \varphi(s) \twoheadrightarrow\_{\mathbf{q}'} \varphi(s') \tag{1}$$

*If so, the resulting partial order* ( ϕ)<sup>∗</sup> *is written* <sup>≤</sup>ϕ*.*

Let σ : SA and τ : TA be partial maps of event structures (we dropped polarities, as the construction is completely independent of them). A pair (x, y) <sup>∈</sup> *<sup>C</sup>* (S) <sup>×</sup> *<sup>C</sup>* (T) such that σ<sup>↓</sup>x <sup>=</sup> τ<sup>↓</sup>y <sup>∈</sup> *<sup>C</sup>* (A), induces a bijection <sup>ϕ</sup>x,y : <sup>x</sup> <sup>y</sup><sup>∗</sup> <sup>x</sup><sup>∗</sup> <sup>y</sup> defined by local injectivity of <sup>σ</sup> and <sup>τ</sup> :

$$\begin{aligned} \varphi\_{x,y}(0,s) &= (0,s) & (s \in x\_\*) \\ \varphi\_{x,y}(0,s) &= (1, \tau^{-1}(\sigma s)) & (s \in x\_\perp) \\ \varphi\_{x,y}(1,t) &= (1,t) \end{aligned}$$

The configurations x and y have a partial order inherited from S and T. Viewing <sup>y</sup><sup>∗</sup> and <sup>x</sup><sup>∗</sup> as discrete orders (the ordering relation is the equality), <sup>ϕ</sup>x,y

is a bijection between carrier sets of partial orders. An **interaction state** of σ and <sup>τ</sup> is (x, y) <sup>∈</sup> *<sup>C</sup>* (S) <sup>×</sup> *<sup>C</sup>* (T) with <sup>σ</sup>↓<sup>x</sup> <sup>=</sup> <sup>τ</sup>↓<sup>y</sup> for which <sup>ϕ</sup>x,y is secured. As a result (the graph of) <sup>ϕ</sup>x,y is naturally partial ordered. Write *<sup>S</sup>*σ,τ for the set of interaction states of σ and τ . As usual [5], we can recover an event structure:

**Definition 4 (Closed interaction of uncovered pre-strategies).** *Let* A *be an event structure, and* σ : SA *and* τ : TA *be partial maps of event structures. The following data defines an event structure* S <sup>∧</sup> T*:*


This event structure comes with partial maps <sup>Π</sup><sup>1</sup> : <sup>S</sup>∧TS and <sup>Π</sup><sup>2</sup> : <sup>S</sup>∧TT, analogous to the usual projections of a pullback: for (x, y) <sup>∈</sup> S <sup>∧</sup> T, Π<sup>1</sup>(x, y) is defined to <sup>s</sup> <sup>∈</sup> <sup>S</sup> whenever the top-element of <sup>ϕ</sup>x,y is ((0, s), w<sup>2</sup>) for some <sup>w</sup><sup>2</sup> <sup>∈</sup> <sup>x</sup><sup>∗</sup> <sup>y</sup>. The map <sup>Π</sup><sup>1</sup> is undefined only on events of <sup>S</sup> <sup>∧</sup> <sup>T</sup> corresponding to internal events of <sup>T</sup> (*i.e.* (x, y) with top element of <sup>ϕ</sup>x,y of the form ((1, t),(1, t))). The map <sup>Π</sup><sup>2</sup> is defined symmetrically, and undefined on events corresponding to internal events of <sup>S</sup>. We write <sup>σ</sup> <sup>∧</sup> <sup>τ</sup> for <sup>σ</sup> ◦ <sup>Π</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup> ◦ <sup>Π</sup><sup>2</sup> : <sup>S</sup> <sup>∧</sup> TA.

**Lemma 1.** *Let* σ : SA *and* τ : TA *be partial maps. Let* (X, f : X S, g : XT) *be a triple such that the following outer square commutes:*

*If for all* p <sup>∈</sup> X *with* f p *and* g p *defined,* σ(f p) = τ (g p) *is defined, then there exists a unique* f,g : XS <sup>∧</sup> T *making the two upper triangles commute.*

From this closed interaction, we define the open interaction as in [14]. Given two pre-strategies <sup>σ</sup> : <sup>S</sup> <sup>→</sup> <sup>A</sup><sup>⊥</sup>  B and τ : T <sup>→</sup> B<sup>⊥</sup>  C, their interaction

$$\tau \circledast \sigma : (S \parallel C) \wedge (A \parallel T) \rightharpoonup A^\perp \parallel C$$

is defined as the composite partial map (S  C)∧(A  T) A  B  CA  C, where the "pullback" is first computed ignoring polarities – the codomain of the resulting partial map is A<sup>⊥</sup> <sup>C</sup>, once we reinstate polarities.

**Weak Bisimulation.** To compare uncovered pre-strategies, we cannot use isomorphisms as in [14], since as hinted earlier, ccA <sup>σ</sup> comprises synchronised events not corresponding to those in σ. To solve this, we introduce weak bisimulation between uncovered strategies:

**Definition 5.** *Let* σ : SA *and* τ : TA *be uncovered pre-strategies. A weak bisimulation between* σ *and* τ *is a relation <sup>R</sup>* <sup>⊆</sup> *<sup>C</sup>* (S) <sup>×</sup> *<sup>C</sup>* (T) *containing* (∅, <sup>∅</sup>)*, such that for all* x *<sup>R</sup>* y*, we have:*


*Two uncovered pre-strategies* σ, τ *are weakly bisimilar (written* σ τ *) when there is a weak bisimulation between them.*

Associativity of interaction (up to isomorphism, hence up to weak bisimulation) follows directly from Lemma 1. Moreover, it is straightforward to check that weak bisimulation is a congruence (*i.e.* compatible with composition).

**Composition of Covered Strategies.** From interaction, we can easily define the composition of covered strategies. If σ : S <sup>→</sup> A<sup>⊥</sup>  B and τ : T <sup>→</sup> B<sup>⊥</sup>  C are covered pre-strategies, their composition (in the sense of [14]) τ σ is defined as (τ σ)↓. The operation <sup>↓</sup> is well-behaved with respect to interaction:

**Lemma 2.** *For* σ, τ *composable pre-strategies,* (<sup>τ</sup> <sup>σ</sup>)<sup>↓</sup> <sup>∼</sup><sup>=</sup> <sup>τ</sup><sup>↓</sup> <sup>σ</sup><sup>↓</sup>.

#### **3.3 A Compact-Closed Category of Uncovered Strategies**

Although we have a notion of morphism (pre-strategies) between games and an associative composition, we do not have a category up to weak bisimulation yet. Unlike in [14], races in a game may cause copycat on this game to not be idempotent (see [3] for a counterexample), which is necessary for it to be an identity. To ensure that, we restrict ourselves to **race-free** games: those such that whenever a configuration <sup>x</sup> can be extended by <sup>a</sup><sup>1</sup>, a<sup>2</sup> of distinct polarities, the union <sup>x</sup>∪ {a<sup>1</sup>, a<sup>2</sup>} is consistent. From now on, games are assumed race-free.

**Lemma 3.** *For a race-free game* <sup>A</sup>*,* ccA ccA ccA*.*

*Proof.* It will follow from the forthcoming Lemma 4.

**Uncovered Strategies.** Finally, we characterise the pre-strategies invariant under composition with copycat. The two ingredients of [5,14], receptivity and courtesy (called *innocence* in [14]) are needed, but this is not enough: we need another condition as witnessed by the following example.

Consider the strategy <sup>σ</sup> : <sup>⊕</sup><sup>1</sup> - -⊕<sup>2</sup> on the game <sup>A</sup> <sup>=</sup> <sup>⊕</sup><sup>1</sup> <sup>⊕</sup><sup>2</sup> playing nondeterministically one of the two moves. Then the interaction ccA <sup>σ</sup> is:

$$A^\* \longrightarrow \begin{array}{c} A \\\\ \xrightarrow{\*\_1} \xrightarrow{\*\_2} \oplus \oplus\_1 \\ \xrightarrow{\*} \oplus \oplus\_2 \end{array}$$

It is not weakly bisimilar to <sup>σ</sup>: ccA <sup>σ</sup> can do <sup>∗</sup>1, an internal transition, to which <sup>σ</sup> can only respond by not doing anything. Then <sup>σ</sup> can still do <sup>⊕</sup><sup>1</sup> and <sup>⊕</sup><sup>2</sup> whereas ccA <sup>σ</sup> cannot: it is committed to doing <sup>⊕</sup>1. To solve this problem, we need to force strategies to decide their nondeterministic choices *secretly*, by means of internal events – so <sup>σ</sup> will not be a valid uncovered strategy, but ccA <sup>σ</sup> will. Indeed, ccA (ccA <sup>σ</sup>) below is indeed weakly bisimilar to ccA <sup>σ</sup>.

**Definition 6.** *An (uncovered) strategy is a pre-strategy* σ : SA *satisfying:*


Receptivity and courtesy are stated exactly as in [14]. As a result, hiding the internal events of an uncovered strategy yields a strategy <sup>σ</sup><sup>↓</sup> in the sense of [14].

For any game <sup>A</sup>, ccA is an uncovered strategy: it satisfies secrecy as its only minimal conflicts are inherited from the game and are between negative events.

**The Category CG**-**.** Our definition of uncovered strategy does imply that copycat is neutral for composition.

**Lemma 4.** *Let* <sup>σ</sup> : SA *be an uncovered strategy. Then* ccA <sup>σ</sup> <sup>σ</sup>*.*

The result follows immediately:

**Theorem 1.** *Race-free games and uncovered strategies up to weak bisimulation form a compact-closed category* **CG**-*.*

#### **3.4 Interpretation of Affine Nondeterministic PCF**

From now on, strategies are by default considered uncovered. We sketch the interpretation of **APCF**<sup>+</sup> inside **CG**-. As a compact-closed category, **CG**- supports an interpretation of the linear λ-calculus. However, the empty game 1 is not terminal, as there are no natural transformation A : <sup>A</sup> <sup>→</sup> 1 in **CG**-.

**The negative category CG**<sup>−</sup> -. We solve this issue as in [4], by looking at negative strategies and negative games.

**Definition 7.** *An event structure with partial polarities is negative when all its minimal events are negative.*

A strategy σ : SA is negative when S is. Copycat on a negative game is negative, and negative strategies are stable under composition:

**Lemma 5.** *There is a subcategory CG*<sup>−</sup> *of CG consisting in negative racefree games and negative strategies. It inherits a monoidal structure from CG in which the unit (the empty game) is terminal.*

Moreover, **CG**<sup>−</sup> has products. The **product** <sup>A</sup> & <sup>B</sup> of two games <sup>A</sup> and B, has events, causality, polarities as for A  B, but consistent sets restricted to those of the form {0} × X or {1} × X with X consistent in A or B. The **projections** are A : CCA <sup>→</sup> (<sup>A</sup> & <sup>B</sup>)<sup>⊥</sup> <sup>A</sup>, and B : CCB <sup>→</sup> (<sup>A</sup> & <sup>B</sup>)<sup>⊥</sup> <sup>B</sup>.

Finally, the **pairing** of negative strategies <sup>σ</sup> : SA<sup>⊥</sup>  B and τ : T <sup>→</sup> A<sup>⊥</sup>  C is the obvious map σ, τ : S & TA<sup>⊥</sup>  B & C, and the laws for the cartesian product are direct verifications.

We also need a construction to interpret the function space. However, for A and B negative, A<sup>⊥</sup> <sup>B</sup> is not usually negative. To circumvent this, we introduce a negative variant A - B, the linear arrow. To simplify the presentation, we only define it in a special case. A game is **well-opened** when it has at most one initial event. When B is well-opened, we define A - B to be 1 if B = 1; and otherwise A<sup>⊥</sup> <sup>B</sup> with the exception that every move in <sup>A</sup> depends on the single minimal move in B. As a result preserves negativity. We get:

**Lemma 6.** *If* B *is well-opened,* A - B *is well-opened and is an exponential object of* A *and* B*.*

In other words, well-opened games are an exponential ideal in **CG**<sup>−</sup> -. We interpret types of **APCF**<sup>+</sup> inside well-opened games of **CG**<sup>−</sup> -:

**com** <sup>=</sup> **run**<sup>−</sup> ❴ **done**<sup>+</sup> B <sup>=</sup> <sup>q</sup><sup>−</sup> ❈ ✄ ④ ❀ tt<sup>+</sup> - ff<sup>+</sup> A - B <sup>=</sup> A -B

**Interpretation of Terms.** Interpretation of the affine λ-calculus in **CG**- <sup>−</sup> follows standard methods. First, the primitives tt, ff, <sup>⊥</sup>, if are interpreted as:

A non-standard point is the interpretation of ⊥: usually interpreted in game semantics by the minimal strategy simply playing q (as will be done in the next section), our interpretation here reflects the fact that ⊥ represents an infinite computation that never returns. Conditionals are implemented as usual:

> if MNN - <sup>=</sup> if (M-  N-, N -).

**Soundness and Adequacy.** We now prove adequacy for various notions of convergence. First, we build an uncovered strategy from the operational semantics.

**Definition 8 (The operational tree).** *Let* M *be a closed term of type* <sup>B</sup>*. We define the pre-strategy* <sup>t</sup>(M) *on* <sup>B</sup> *as follows:*

**Events:** *An initial event* <sup>⊥</sup> *plus one event per derivation* M <sup>→</sup><sup>∗</sup> <sup>M</sup> *.* **Causality:** ⊥ *is below other events, and derivations are ordered by prefix* **Consistency:** *A set of events is consistent when its events are comparable.* **Labelling:** <sup>⊥</sup> *has label q, a derivation* <sup>M</sup> <sup>→</sup><sup>∗</sup> b *where* b ∈ {tt, ff} *is labelled by* b*. Other derivations are internal.*

As a result, <sup>t</sup>(M) is a tree. Our main result of adequacy can now be stated:

**Theorem 2.** *For a term* M : <sup>B</sup>*,* <sup>t</sup>(if M tt ff) *and* M*are weakly bisimilar.*

We need to consider <sup>t</sup>(if M tt ff) and not simply <sup>t</sup>(M) to ensure secrecy. From this theorem, adequacy results for may and fair convergences arise:

**Corollary 1.** *For any term* M : <sup>B</sup>*, we have:*

**May:** M <sup>→</sup><sup>∗</sup> tt *if and only if* M *contains a positive move* **Fair:** *For all* M <sup>→</sup><sup>∗</sup> M *,* M *can converge, if and only if all finite configurations of* M *can be extended to contain a positive move.*

However, we cannot conclude adequacy for must equivalence from Theorem 2. Indeed, must convergence is not generally stable under weak bisimilarity: for instance, (the strategies representing) tt and Y (λx. if choice tt x) are weakly bisimilar but the latter is not must convergent. To address this in the next section we will refine the interpretation to obtain a closer connection with syntax.

### **4 Essential Events**

The model presented in the previous section is very operational; configurations of M can be seen as derivations for an operational semantics. The price, however, is that besides the fact that the interpretation grows dramatically in size, we can only get a category up to weak bisimulation, which can be too coarse (for instance for must convergence). We would like to remove all events that are not relevant to the behaviour of terms up to weak bisimulation. In other words, we want a notion of *essential internal events* that *(1)* suffices to recover all behaviour with respect to weak bisimulation, but which *(2)* is not an obstacle to getting a category up to isomorphism (which amounts to ccA ◦ <sup>σ</sup> <sup>∼</sup><sup>=</sup> <sup>σ</sup>).

### **4.1 Definition of Essential Events**

As shown before, the loss of behaviours when hiding is due to the disappearance of events participating in a conflict. A neutral event may not have visible consequences but still be relevant if in a minimal conflict; such events are *essential*.

**Definition 9.** *Let* σ : SA *be an uncovered pre-strategy. An essential event of* S *is an event* s *which is either visible, or (internal and) involved in a minimal conflict (that is such that we have* s - xs *for some* <sup>s</sup> , x*).*

Write <sup>E</sup>S for the set of essential events of <sup>σ</sup>. Any pre-strategy <sup>σ</sup> : SA induces another pre-strategy *<sup>E</sup>* (σ) : *<sup>E</sup>* (S) = <sup>S</sup> <sup>↓</sup> <sup>E</sup>S A called **the essential part** of <sup>σ</sup>. The following proves that our definition satisfies *(1)*: no behaviour is lost.

**Lemma 7.** *An uncovered pre-strategy* σ : SA *is weakly bisimilar to <sup>E</sup>* (σ)*.*

This induces a new notion of (associative) composition only keeping the essential events. For σ : A<sup>⊥</sup>  B and τ : B<sup>⊥</sup>  C, let τ σ <sup>=</sup> *<sup>E</sup>* (τ σ). We observe that *<sup>E</sup>* (<sup>τ</sup> <sup>σ</sup>) <sup>∼</sup><sup>=</sup> *<sup>E</sup>* (<sup>τ</sup> ) *<sup>E</sup>* (σ).

Which pre-strategies compose well with copycat with this new composition?

#### **4.2 Essential Strategies**

We now can state property *(2)*: the events added by composition with copycat are inessential, hence hidden during composition:

**Theorem 3.** *Let* <sup>σ</sup> : SA *be an uncovered strategy. Then* ccA <sup>σ</sup> <sup>∼</sup><sup>=</sup> *<sup>E</sup>* (σ)*.*

This prompts the following definition. An uncovered pre-strategy σ is **essential** when it is a strategy, and if, equivalently: *(1)* all its events are essential, *(2)* <sup>σ</sup> <sup>∼</sup><sup>=</sup> *<sup>E</sup>* (σ). We obtain a characterisation of strategies in the spirit of [14]:

**Theorem 4.** *A pre-strategy* <sup>σ</sup> : SA *is essential if and only if* ccA <sup>σ</sup> <sup>∼</sup><sup>=</sup> <sup>σ</sup>*.*

As a result, we get:

**Theorem 5.** *Race-free games, and essential strategies up to isomorphism form a compact-closed category CG.*

**Relationship Between CG and CG.** Covered strategies can be made into a compact-closed category [5,14]. Remember that the composition of σ : S <sup>→</sup> A<sup>⊥</sup>  B and τ : T <sup>→</sup> B<sup>⊥</sup>  C in **CG** is defined as τ σ = (τ σ)↓.

**Lemma 8.** *The operation* <sup>σ</sup> → <sup>σ</sup><sup>↓</sup> *extends to an identity-on-object functor CG* → *CG.*

In the other direction, a strategy σ : A might not be an essential strategy; in fact it might not even be an uncovered strategy, as it may fail secrecy. Sending <sup>σ</sup> to ccA <sup>σ</sup> delegates the non-deterministic choices to internal events and yields an essential strategy, but this operation is not functorial.

**Relationship Between CG and CG**-**.** The forgetful operation mapping an essential strategy σ to itself, seen as an uncovered strategy, defines a functor **CG** → **CG**-. Indeed, if two essential strategies are isomorphic, they are also weakly bisimilar. Moreover, we have that τ σ *<sup>E</sup>* (τ σ) = τ σ. However the operation *E* (·) does not extend to a functor in the other direction even though *<sup>E</sup>* (<sup>τ</sup> ) *<sup>E</sup>* (σ) <sup>∼</sup><sup>=</sup> *<sup>E</sup>* (<sup>τ</sup> <sup>σ</sup>), as it is defined only on concrete representatives, not on equivalence classes for weak bisimilarity.

#### **4.3 Interpretation of APCF<sup>+</sup>**

We now show that this new category also supports a sound and adequate interpretation of **APCF**<sup>+</sup> for various testing equivalences, including must. As before, we need to construct the category of negative games and strategies.

**Lemma 9.** *There is a cartesian symmetric monoidal category CG*<sup>−</sup> *of negative race-free games and negative essential strategies up to isomorphism. Well-opened negative race-free games form an exponential ideal of CG*<sup>−</sup> *.*

We keep the same interpretation of types of affine nondeterministic PCF. Moreover, the strategy if is essential. As a result, we let:

$$\{\mathsf{L}\}\_{\otimes} = q : \mathbb{B} \qquad \{\mathsf{if}\ M \, N \, N'\}\_{\otimes} = \mathsf{if}\ \odot \left( \{M\}\_{\otimes} \parallel \langle [N]\_{\otimes}, [N']\_{\otimes} \rangle \right)$$

Using *<sup>E</sup>* (σ τ ) = *<sup>E</sup>* (σ) *<sup>E</sup>* (τ ), one can prove by induction that for any term <sup>M</sup> we have M <sup>=</sup> *<sup>E</sup>* (M-). Furthermore, this interpretation permits a stronger link between the operational and the denotational semantics:

**Theorem 6.** *For all terms* <sup>M</sup> : <sup>B</sup>*, <sup>E</sup>* (t(M)) <sup>∼</sup><sup>=</sup> M*.*

Theorem 6 implies Theorem 2. It also implies adequacy for must:

**Corollary 2.** *The interpretation* · *is adequate for may, and fair, and must:* M : <sup>B</sup> *has no infinite derivations if and only if all (possibly infinite) maximal configurations of* M *have a positive event.*

This result also implies that ·is adequate for must.

### **5 Conclusion**

We have described an extension of the games of [14] to uncovered strategies, composed without hiding. It has strong connections with operational semantics, as the interpretations of terms of base type match their tree of reductions. It also forms a compact-closed category up to weak bisimulation, and is adequate for the denotational semantics of programming languages. Identifying the *inessential* events as those responsible for the non-neutrality of copycat, we remove them to yield a compact closed category up to isomorphism. Doing so we obtain our sought-after setting for the denotational semantics of programming languages, one *agnostic w.r.t.* the chosen testing equivalence. The work blends well with the technology of [7] (symmetry, concurrent innocence) dealing with non-affine languages and characterising strategies corresponding to pure programs; these developments appear in the first author's PhD thesis [3].

**Acknowledgements.** We gratefully acknowledge the support of the ERC Advanced Grant ECSYM, EPSRC grants EP/K034413/1 and EP/K011715/1, and LABEX MILYON (ANR-10-LABX-0070) of Universit´e de Lyon, within the program "Investissements d'Avenir" (ANR-11-IDEX-0007) operated by the ANR.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Trace Semantics for System F Parametric Polymorphism**

Guilhem Jaber<sup>1</sup> and Nikos Tzevelekos2(B)

<sup>1</sup> ENS de Lyon, Universit´e de Lyon, LIP, Lyon, France <sup>2</sup> Queen Mary University of London, London, England nikos.tzevelekos@qmul.ac.uk

**Abstract.** We present a trace model for Strachey parametric polymorphism. The model is built using operational nominal game semantics and captures parametricity by using names. It is used here to prove an operational version of a conjecture of Abadi, Cardelli, Curien and Plotkin which states that Strachey equivalence implies Reynolds equivalence in System F.

### **1 Introduction**

Parametricity was first introduced by Strachey [22] as a way to characterise the behaviour of polymorphic programs as being uniform with respect to the type of the arguments provided. He opposed this notion to ad-hoc polymorphism, where a function can produce arbitrarily different outputs when provided inputs of different types (for example an integer and a boolean). To formalise this notion of parametricity, Reynolds introduced relational parametricity [21]. It is defined using an equivalence on programs, that we call Reynolds equivalence and is a generalisation of logical relations to System F. This equivalence uses arbitrary relations over pairs of types to relate polymorphic programs. So a parametric program that takes related arguments as input will produce related results. Reynolds parametricity has been developed into a fundamental theory for studying polymorphic programs [1,20,23].

Following results of Mitchell on PER-models of polymorphism [18], Abadi, Cardelli, Curien and Plotkin [1,20] introduced another, more intentional notion of equivalence, called Strachey equivalence. Two terms of System F are Strachey equivalent whenever, by removing all their type annotations, we obtain two βηequivalent untyped terms. The authors conjectured that Strachey equivalence implies Reynolds equivalence (the converse being easily shown to be false).

In this paper we examine a notion of Reynolds equivalence based on operational logical relations, and prove that, for this notion, the conjecture holds. To do so, we introduce a trace model for System F based on operational nominal game semantics [12,14]. Terms in our model are denoted as sets of traces, generated by a labelled transition system, which represent interactions with arbitrary term contexts. In order to abstract away type information from inputs to polymorphic functions, our semantics uses *names* to model such inputs. The idea is

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 20–38, 2018. https://doi.org/10.1007/978-3-319-89366-2\_2

$$\begin{array}{c} \begin{array}{c} \Delta;\Gamma,x:\theta\vdash M:\theta'\\ \Delta;\Gamma\vdash\lambda x'.M:\theta\rightarrow\theta' \end{array} & \begin{array}{c} \Delta;\Gamma\vdash M:\theta\rightarrow\theta' \qquad \Delta;\Gamma\vdash N:\theta\\ \hline \Delta;\Gamma\vdash MN:\theta'\\ \Delta;\Gamma\vdash x:\theta \end{array} & \begin{array}{c} \Delta;\Gamma;\Gamma\vdash M:\theta'\\ \Delta;\Gamma\vdash M:\theta\\ \Delta;\Gamma\vdash AX.M:\forall X.\theta \end{array} & \begin{array}{c} \Delta;\Gamma\vdash M:\forall X.\theta\\ \hline \Delta;\Gamma\vdash M\theta':\theta\{\theta'/X\} \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \{\lambda x.M\}N\r::\\_\beta\eta\ M\{X/X\}\\ \{\lambda X.M\}\theta\models\_\beta\eta\ M\{\theta/X\} \end{array} \\ \hline \end{array}$$

**Fig. 1.** Typing rules and βη-equality axioms.

the following: since names have no internal structure, the function has no choice but to act "the same way" on such inputs, i.e. be parametric. Our trace model yields a third notion of equivalence: trace equivalence (i.e. equality of sets of traces). Then, the result is proven by showing that trace equivalence is included in (operational) Reynolds equivalence, while it includes Strachey equivalence.

The traces in our model are formed of *moves*, which represent interactions between the modelled term (the *Player* ) and its context (the *Opponent*): either of Player or Opponent can interrogate the terms provided by the other one, or respond to a previous such interrogation. These moves are called *questions* and *answers* respectively. Names enter the scene when calling terms which are of polymorphic type, in which case the calling party would replace the actual argument type θ with a *type name* α, and record locally the correspondence between α and θ. Another use of names in our model is for representing terms that are passed around as arguments to questions. These are called *computation names*, and are typed according to the term they each represent.

### **2 Definition of System F and Parametricity**

We start off by giving the definitions of System F and of the parametric equivalence relations we shall examine on it. The grammar for System F is standard and given by:

$$\begin{array}{rcl} \mathsf{Type} \ni & \theta, \theta' ::= X \mid \theta \to \theta' \mid \forall X. \theta\\ \mathsf{Term} \ni M, N ::= \lambda x^{\theta}.M \mid A X.M \mid MN \mid M \theta \end{array}$$

We write x, etc. for *(term) variables*, sourced from a countable set Var; and X, etc. for *type variables*, taken from TVar. We define substitutions of open variables of either kind in the usual capture-avoiding way. For instance, the term obtained by consecutively applying substitutions η : Var Term and δ : TVar Type on M is written M{η}{δ}.

Terms are typed in environments Δ; Γ, where Δ is a finite set of type variables, and <sup>Γ</sup> is a set {x<sup>1</sup> : <sup>θ</sup>1,...,xm : <sup>θ</sup>m} of variable-type pairs. The typing rules are given in Fig. 1. The operational semantics we examine is βη-equality, defined as the least syntactic congruence =βη that includes the axioms given on the RHS part of Fig. 1.

We shall use the following common polymorphic encodings:


*Reynolds Equivalence.* We next introduce logical relations for System F. First, we let Rel be the set of all typed relations between closed terms that are compatible with =βη:

$$\begin{split} \text{Rel} = \{ (\theta\_1, \theta\_2, R) \mid R \subseteq \mathsf{Term} \times \mathsf{Term} \land \forall (M\_1, M\_2) \in R. \ \cdot; \vdash M\_i : \theta\_i \\ \land \forall M\_1' =\_{\beta \eta} M\_1. \forall M\_2' =\_{\beta \eta} M\_2. \ (M\_1', M\_2') \in R \} \end{split}$$

Logical relations <sup>R</sup>[[θ]]δ are defined below, indexed by environments <sup>δ</sup> : TVar Rel:

R[[X]]<sup>δ</sup> = R when δ(X)=( , , R) R[[∀X.θ]]<sup>δ</sup> = {(M1, M2) | ∀(θ1, θ2, R) ∈ Rel. (M1θ1, M2θ2) ∈ R[[θ]]<sup>δ</sup>·[X-<sup>→</sup>(θ1,θ2,R)]} R[[θ<sup>1</sup> →θ2]]<sup>δ</sup> = {(M1, M2) | ∀(N1, N2) ∈ R[[θ1]]δ. (M1N1, M2N2) ∈ R[[θ2]]δ}

We can now define the first notion of parametric equivalence for System F.

**Definition 1.** Given terms <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup>1, M<sup>2</sup> : <sup>θ</sup>, we say that they are *Reynolds equivalent*, and write <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup><sup>1</sup> log <sup>M</sup><sup>2</sup> : <sup>θ</sup>, if:

$$\forall \delta \in \mathcal{R} \lbrack \Delta \rbrack. \forall (\eta\_1, \eta\_2) \in \mathcal{R} \lbrack \Gamma \rbrack\_{\delta}. \ (M\_1 \{\eta\_1\} \{\delta\_1\}, M\_2 \{\eta\_2\} \{\delta\_2\}) \in \mathcal{R} \lbrack \theta \rbrack\_{\delta}$$

where R[[Δ]] = dom(Δ) → Rel, δ<sup>1</sup> = {(X, θ1) | δ(X)=(θ1, , )} (similar for <sup>δ</sup>2) and <sup>R</sup>[[Γ]]δ <sup>=</sup> {(η1, η2) <sup>∈</sup> (dom(Γ) Term)<sup>2</sup> | ∀(x, θ ) ∈ Γ. (η1(x), η2(x)) ∈ R[[θ ]]δ}.

The following result is standard [21].

**Theorem 2 (Fundamental Property).** *If* <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> *then* <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup> log M : θ*.*

*Remark 3.* Note that our definition of Reynolds equivalence does not coincide with either of the definitions given in [1,20]: therein, parametricity is defined using relational logics (and accompanying proof systems), whereas here we use quantification over concrete relations over closed terms.

*Strachey Equivalence.* Another notion of parametric equivalence is defined by means of erasing types from terms. We define the *type erasure* **erase**(M) of a term M by:

$$\begin{array}{l} \mathsf{erase}(\Lambda X.M) = \mathsf{erase}(M) & \mathsf{erase}(MN) = \mathsf{erase}(M)\mathsf{erase}(N) \\ \mathsf{erase}(\lambda x^{\theta}.M) = \lambda x.\mathsf{erase}(M) & \mathsf{erase}(M\theta) = \mathsf{erase}(M) \end{array}$$

and **erase**(x) = <sup>x</sup>. Thus, **erase**(M) is an untyped <sup>λ</sup>-term. Below we overload =βη to also mean βη-equality in the untyped λ-calculus.

**Definition 4.** Given terms <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup>1, M<sup>2</sup> : <sup>θ</sup>, we say that they are *Strachey equivalent* if **erase**(M1) =βη **erase**(M2).

It was conjectured in [1,20] that Reynolds equivalence includes Strachey equivalence. We prove this holds for the version of Reynolds equivalence given in Definition 1.

#### **Theorem 5.** *Any two Strachey equivalent terms are also Reynolds equivalent.*

It is interesting to think why a direct approach would not work in order to prove this conjecture. Given Strachey equivalent terms M1, M<sup>2</sup> of type **Bool**, suppose we want to prove them Reynolds equivalent. We therefore take (θ1, θ2, R) <sup>∈</sup> Rel, (N1,1, N2,<sup>1</sup>) <sup>∈</sup> <sup>R</sup>, and (N1,2, N2,<sup>2</sup>) <sup>∈</sup> <sup>R</sup>, and aim to prove that (M1θ1N1,1N1,2, M2θ2N2,1N2,<sup>2</sup>) <sup>∈</sup> <sup>R</sup>. Ideally, we would like to prove that there exists <sup>j</sup> ∈ {1, <sup>2</sup>} s.t. for all <sup>i</sup> ∈ {1, <sup>2</sup>}, <sup>M</sup>iθiNi,1Ni,<sup>2</sup> <sup>=</sup>βη <sup>N</sup>i,j , but that seems overly optimistic. A first trick is to use Theorem 2, to get that M<sup>2</sup> is related with itself. Thus, we get that (M2θ1N1,1N1,2, M2θ2N2,1N2,<sup>2</sup>) <sup>∈</sup> <sup>R</sup>, and it would suffice to prove <sup>M</sup>1θ1N1,1N1,<sup>2</sup> <sup>=</sup>βη <sup>M</sup>2θ1N1,1N1,<sup>2</sup> to conclude. However, our hypothesis is simply that **erase**(M1) =βη **erase**(M2).

A possible solution to the above could be to <sup>β</sup>*-reduce* both <sup>M</sup>iθ1N1,1N1,<sup>2</sup>, hoping that the distinction between the two terms will vanish. Our trace semantics provides a way to model the interaction between such a term <sup>M</sup>i and a context • <sup>θ</sup>jNj,1Nj,<sup>2</sup>, and to deduce properties about the normal form reached by their application via head reduction.

### **3 A Nominal Trace Semantics for System F**

In this section we introduce a trace semantics for open terms which will be our main vehicle of study for System F. The terms in our semantics will be allowed to contain special constants representing any term that could fill in their open variables (these be term or type variables). The use of names can be seen as a nominal approach to parametricity: parametric types and values are represented in our semantics by names, without internal structure. Thus, e.g. a parametric function is going to behave "the same way" for any input, since the latter will be nothing but a name.

Our approach follows the line of work on nominal techniques [7,19] and nominal operational game semantics [12,14]. We let the set of *names* be:

#### <sup>N</sup> <sup>=</sup> TN CN

We therefore use two kinds of names: type names α, β <sup>∈</sup> TN; and computation names c, d <sup>∈</sup> CN. We will range over arbitrary names by <sup>a</sup> and variants. We extend the syntax of terms and types by including computation and type names as constants, and call the resulting syntax *namey terms and types*:

$$M, N ::= c \mid x \mid \lambda x^{\theta}. M \mid \Lambda X. M \mid MN \mid M\\\theta \qquad \theta, \theta' ::= \alpha \mid X \mid \theta \to \theta' \mid \Lambda X. \theta$$

A namey term or type is *closed* if it contains no free (type/term) variables – but it may contain names. On the other hand, a *value* is a closed term in head normal form that contains no names. We range over values with v and variants.

We will use the notation M, ˆ Nˆ, and variants, to refer jointly to namey terms and namey types. Namey terms are typed with additional typing hypotheses for the added constants. These typings are made explicit in the trace model. By abuse of terminology, we will drop the adjective "namey" and refer to the above simply as "terms" and "types". Formally speaking, namey terms and types form *nominal sets* (cf. Definition 8).

*Note 6 (what do c's and* α*'s represent?).* A computation name c represents a term that can replace the open variables of a term M. That is, in order to examine the semantics of λxθ.M, we will look instead at M{c/x} where c a computation name of appropriate type. Type names α have a similar purpose, for types.

Our trace semantics is built on top of head reduction, which is reminded next. Moreover, we shall be using types in *extended form*, which determines the number and types of arguments needed in order to fully apply a term of a given type.

**Definition 7.** The (standard) head reduction rules are given in Fig. 2. Head normal forms are given by the syntax on the LHS below,

$$M\_{\mathsf{hnf}} ::= E[x] \mid E[c] \mid \lambda x^{\theta}. M\_{\mathsf{hnf}} \mid \lambda X. M\_{\mathsf{hnf}} \qquad \qquad E ::= \bullet \mid EM \mid E\theta$$

where E ranges over *evaluation contexts* (defined on the RHS). Evaluation contexts are typed with types of the form θ θ . We write E : θ θ if we can derive • : θ E : θ .

An *extended type form* is a sequence (τ1, ..., τn, ξ) with <sup>ξ</sup> <sup>∈</sup> TVar∪TN and, for each <sup>i</sup>, <sup>τ</sup>i <sup>∈</sup> Type ∪ {∀<sup>X</sup> <sup>|</sup> <sup>X</sup> <sup>∈</sup> TVar}. Formally, the extended form of a type θ, written ext(θ), is defined by:

$$\text{ext}(\forall X.\theta) = (\forall X) :: \text{ext}(\theta) \qquad \text{ext}(\theta \to \theta') = \theta :: \text{ext}(\theta') \qquad \text{ext}(\xi) = (\xi)$$

where we write h ::t for the sequence with head h and tail t (cf. list notation). Elements of the form ∀X in these sequences are binders that bind to their right.

We let →<sup>∗</sup> be the reflexive-transitive closure of →. It is a standard result that →<sup>∗</sup> preserves typing and (strongly) normalises to head normal forms.

We finally introduce some infrastructure for working with objects with names.

$$\begin{array}{ccccc} (\lambda x.M)N \to M\{N/x\} & & & M \to M'\\ (\Lambda X.M)\theta \to M\{\theta/X\} & & & \lambda x.M \to \lambda x.M' \end{array} \quad \begin{array}{ccccc} M \to M'\\ \Lambda X.M \to \Lambda X.M' \end{array} \quad \begin{array}{ccccc} M \to M' \quad \{\ast\} \\ \hline E[M] \to E[M'] \end{array}$$

**Fig. 2.** Head reduction rules. Condition (∗) stipulates that M be not a Λ/λ-abstraction.

**Definition 8.** We call a permutation <sup>π</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> *finite* if the set {<sup>a</sup> <sup>|</sup> <sup>π</sup>(a) <sup>=</sup> <sup>a</sup>} is finite, and *component-preserving* if, for all <sup>a</sup> <sup>∈</sup> <sup>N</sup>, <sup>a</sup> <sup>∈</sup> TN iff <sup>π</sup>(a) <sup>∈</sup> TN.

A *nominal set* [7] is a pair (Z, ∗) of a set Z along with an action (∗) from the set of finite component-preserving computations of N on the set Z. For each z ∈ Z, the set of names featuring in z form its *support*, written ν(z), which we stipulate to be finite.

In the sequel, when constructing objects with names (such as moves or traces) we shall implicitly assume that these form nominal sets, where the permutation action is defined by taking π ∗ z to be the result of applying π to each name in z.

#### **3.1 Trace Semantics Preview**

Before formally presenting the trace model, we look at some examples informally, postponing the full details for the next section. Head-reduction brings terms into head normal form. The trace semantics allows us to further 'reduce' terms of the form <sup>E</sup>[cMˆ<sup>1</sup> ··· <sup>M</sup>ˆn], where <sup>c</sup> is some computation name. For such a term, following the game semantics approach [3,11], our model will issue a *move* interrogating the computation <sup>c</sup> on arguments <sup>M</sup>ˆi, and putting <sup>E</sup> on top of an *evaluation stack*, denoted E. The move is effectively a call to c, and E functions as a call stack which registers the calls that have been made and are still pending. This will effectively lead to a labelled transition system in which labels are moves issued by two parties: a *Player (P)*, representing the modelled term, and an *Opponent (O)* representing its enclosing term context.

Traces are sequences of *moves*, which in turn are tuples of names belonging to one of these four classes, taking <sup>c</sup> <sup>∈</sup> CN and <sup>a</sup>i <sup>∈</sup> <sup>N</sup> for each <sup>i</sup>:


Given a question move as above, we let its *core name* be c. We distinguish a computation name <sup>c</sup>in <sup>∈</sup> CN, and call questions with core name <sup>c</sup>in *initial*. We define a *trace* T to be a finite sequence of moves. Traces will be restricted to *legal* ones in Definition 12.

In the following examples we give traces produced by simple System F terms. Traces are formally produced by an LTS over configurations whose main component is an evaluation stack. An *evaluation stack* is a stack whose elements are typed evaluation contexts, apart from the top element which can also be a typed term:

$$\mathcal{E} ::= \mathcal{E}' \mid (M, \theta) :: \mathcal{E}' \qquad \mathcal{E}' ::= \diamondsuit \mid (E, \theta \leadsto \theta') :: \mathcal{E}'$$

We denote the empty stack with ♦. In the next two examples, for simplicity, configurations shall only contain evaluation stacks.

*Example 9.* Recall that **id** <sup>=</sup> ΛX.λxX. x : **Unit** and **Unit** <sup>=</sup> <sup>∀</sup>X.X <sup>→</sup> <sup>X</sup>. The extended type of **Unit**, ext(**Unit**)=(∀X, X, X), indicates that **id** requires two arguments in order to be evaluated: one type and one term of that given type. Thus, the traces produced by **id** will start with an interrogating/calling move cin(α, c) of O:


Starting from the initial move cin(α, c), a trace of **id** can be produced as follows:

$$
\langle\langle\diamondsuit\rangle\xrightarrow{c\_{\text{in}}(\alpha,c)}\langle(\mathbf{id}\,\alpha\,c,\alpha)\rangle\rightarrow\langle(c,\alpha)\rangle\xrightarrow{\bar{c}\langle)}\langle(\bullet,\alpha\leadsto\alpha)\rangle\xrightarrow{\mathsf{op}\mathfrak{K}\overline{\mathfrak{K}}}\langle\diamondsuit\rangle
$$

Thus, O starts the interaction by interrogating **id** with α, c. This results in **id** α c, which gets head reduced to c. At this point, c is a head normal form of type α, and P can answer the initial question cin(α, c). This is done in two steps. First, P further reduces c by playing a move ¯c() (here c takes 0 arguments as ext(α)=(α)), and pushes the current evaluation context (•, α α) on the stack. O then responds by triggering a pair of answers OKOK, which answer both questions played so far. The resulting trace is: <sup>c</sup>in(α, c) · <sup>c</sup>¯() · OKOK.

*Note 10 (what are* OKOK *and* OKOK?*).* As System F base types are type variables, there is no real need for answer moves: a type X has no return values. For example, in the game models of Hughes [9] and Laird [15], answer moves were effectively suppressed (either explicitly, or by allowing moves c(···) to function as answers). Here, to give the semantics an operational flavour, we introduce instead explicit 'dummy' answers OK.

*Example 11.* Consider now <sup>M</sup> <sup>=</sup> λf**Unit**. f : **Unit** <sup>→</sup> **Unit**. We have that ext(**Unit** <sup>→</sup> **Unit**)=(**Unit**, <sup>∀</sup>X, X, X), and therefore <sup>M</sup> requires three arguments for its evaluation: one term of type **Unit**, one type, and one term if that latter type. We can therefore start a trace of M with an initial move cin(c1, α1, c) and continue as follows.

$$\langle\langle\rangle\rangle \xrightarrow{c\_{\text{in}}(c\_1,\alpha\_1,c\_2)} \langle\langle Mc\_1\alpha\_1\,c\_2,\alpha\_1\rangle\rangle \to \langle\langle c\_1\,\alpha\_1\,c\_2,\alpha\_1\rangle\rangle \xrightarrow{\bar{c}\_1(\alpha\_2,c\_3)} \langle\langle\bullet,\alpha\_2\rightsquigarrow\alpha\_1\rangle\rangle$$

Thus, the initial move leads to M c1α1c2, which in turn reaches the hnf c1α1c2, with c<sup>1</sup> : **Unit**, and at that point P needs to invoke c<sup>1</sup> with arguments α<sup>1</sup> and c2. These are abstracted away by fresh names α<sup>2</sup> and c<sup>3</sup> respectively, which are passed as arguments to c1. c<sup>3</sup> in particular has type α2. The result of this invocation will be of type α2, which is the hole type in (• : α<sup>2</sup> α1). O can only produce a term of α<sup>2</sup> by simply returning c3. Similarly to before, this is done in two steps: by O playing c3(), which brings c<sup>2</sup> (the term represented by c3) at the top of the stack, which in turn triggers a pair of answers OKOK and brings c<sup>2</sup> inside the context (• : α<sup>2</sup> α1).

$$\langle\langle\bullet,\alpha\_{2}\leadsto\alpha\_{1}\rangle\rangle \xrightarrow{\varepsilon\_{3}()} \langle\langle\mathtt{c}\_{2},\alpha\_{2}\rangle :: \langle\mathtt{\bullet},\alpha\_{2}\leadsto\alpha\_{1}\rangle\rangle \xrightarrow{\overline{\mathtt{Ric}}\otimes\mathsf{K}} \langle\langle\mathtt{c}\_{2},\alpha\_{1}\rangle\rangle \xrightarrow{\varepsilon\_{2}()} \langle\langle\mathtt{\bullet},\alpha\_{1}\leadsto\alpha\_{1}\rangle\rangle \xrightarrow{\overline{\mathtt{Ric}}} \langle\diamondsuit\rangle$$

The latter step leaves us with (c2, α1), which reaches ♦ as in the previous example.

#### **3.2 Definition of the LTS**

We now proceed with the formal definition of the trace semantics. We start off with a series of definitions setting the conditions for a trace to be legal.

The names appearing in a trace are owned by whoever introduces them. A move <sup>m</sup> *introduces* a name <sup>a</sup> in a trace <sup>T</sup> if <sup>m</sup> is a question <sup>q</sup>(a) with <sup>a</sup>i <sup>=</sup> <sup>a</sup> for some i. For each A ∈ {O, P}, we let the set of names of T that *are owned by A* be:

<sup>A</sup>(T) = {<sup>a</sup> <sup>∈</sup> <sup>N</sup> | ∃m. m is an <sup>A</sup>-question in <sup>T</sup> <sup>∧</sup> <sup>m</sup> introduces <sup>a</sup>}.

We will be referring to the names appearing in A(T) as *A-names*.

Each move in a trace needs to be justified, i.e. depend on an earlier move (unless the move is initial). Justification is defined in different ways for questions and answers. Given a trace T and two moves m, m in T, we say that m *justifies* m when m is before m in T and:


Answering of questions is defined as follows. Each answer (occurrence) m answers the pair of question moves (m1, m2) containing the last two question moves in T which are before m and have not been answered yet.

We can now define legality conditions for traces. Below, for A ∈ {O, P}, we say that a move is *A-starting* if it is an A-question or an AA⊥-answer (where O<sup>⊥</sup> = P and P <sup>⊥</sup> = O). Similarly, a move is *A-ending* if it is either an A-question or an A⊥A-answer.

**Definition 12.** A trace <sup>T</sup> is said to be *legal* when, for each <sup>A</sup> ∈ {O, P}:


The conditions above can be given names (suggesting their purpose) as follows: 1. *alternation*, 2. *justification*, 3. *well-introduction*, 4. *well-calling*, 5. *well-answering*.

Each trace T has a complement, which we denote T <sup>⊥</sup> and is obtained from T by switching O/P in all of its moves (i.e. each c(a) becomes ¯c(a), OKOK becomes OKOK, etc). T is legal iff T <sup>⊥</sup> is.

Traces are produced by use of a labelled transition system. The LTS comprises moves as labels, and of *configurations* as nodes. Each configuration contains an evaluation stack of terms and environments that need to be evaluated, as well as mappings containing type/term information on names that have appeared so far. We introduced evaluation stacks in the previous section. Here we shall restrict the allowed shapes thereof as follows. We let *passive* and *active* evaluation stacks be defined by the following two grammars respectively, and take evaluation stacks to be E ::= Epass | Eactv,

$$\mathcal{E}\_{\mathtt{pass}} ::= \bigvee \mid [(E, \alpha \leadsto \theta)] \mid (E, \alpha \leadsto \alpha') :: \mathcal{E}\_{\mathtt{pass}} \,, \quad \mathcal{E}\_{\mathtt{actv}} ::= [(M, \theta)] \mid (M, \alpha) :: \mathcal{E}\_{\mathtt{pass}} \,, \quad \mathcal{E}\_{\mathtt{pass}} \mid$$

where θ ranges over closed types with ν(θ) = ∅, and ♦ is the empty stack.

The other two components of configurations will be maps γ and φ of the shape:

$$
\gamma \in (\mathsf{CN} \multimap (\mathsf{Term} \times \mathsf{Type})) \otimes (\mathsf{TN} \multimap (\mathsf{Type} \times \{\mathsf{J}\ell\})), \quad \phi \in (\mathsf{CN} \multimap \mathsf{Type}) \otimes (\mathsf{TN} \multimap \{\mathsf{J}\ell\}),
$$

with F ⊗ G = {f ∪ g | f ∈ F ∧ g ∈ G}. U is a special "universe" symbol that represents the type of types – it is only used for convenience. Then, in words:


The role of a map γ is to abstract away terms to computational names, and types to type names. On the other hand, a map φ simply types names. In the LTS, when P wants to interrogate an O-computation name c with some arguments, they will abstract away the actual arguments to names, record the abstraction in γ, and call c on these names. On the other hand, when O interrogates a Pcomputation name c with some move c(a), we will record in φ the types of the (new!) O-names a.

The abstraction of arguments to names is instrumented by a dedicated operation AVal. This operation assigns to each sequence ((Mˆ1, τ1), ...,(Mˆn, τn), ξ), where (τ1, ..., τn, ξ) is an extended type (i.e. the type of the computation name we want to call) and each <sup>M</sup>ˆi is a closed term or type (the <sup>i</sup>-th argument), a set of triples of the form (a, γ, β) where:


The operator is formally defined next. In the same definition we introduce the semantics of types, [[θ]], as sets of triples of the form (a, φ, β), which represent all possible input-output name tuples (a, β) that are allowed for θ, including their typing φ.

**Fig. 3.** Reduction rules for the LTS.

**Definition 13.** Given a closed type θ (which may contain type names), we let its semantics be [[θ]] = [[ext(θ)]], where the latter is defined inductively by:

$$\begin{aligned} \left[ \left( \alpha \right) \right] &= \{ \left( \varepsilon, \varepsilon, \alpha \right) \} \\ \left[ \theta :: L \right] &= \{ \left( \left( c, \vec{a} \right), \phi \cdot \left[ c \mapsto \theta \right], \alpha \right) \mid c \in \mathsf{CN}, \left( \vec{a}, \phi, \alpha \right) \in \left[ L \right] \} \\ \left[ \forall X :: L \right] &= \{ \left( \left( \beta, \vec{a} \right), \phi \cdot \left[ \beta \mapsto \mathcal{U} \right], \alpha \right) \mid \beta \in \mathsf{TN}, \left( \vec{a}, \phi, \alpha \right) \in \left[ L\{ \alpha/X \} \right] \} \end{aligned}$$

On the other hand, to each sequence ((Mˆ1, τ1), ...,(Mˆn, τn), ξ) we assign a set of *abstract values* AVal(((Mˆ1, τ1), ...,(Mˆn, τn), ξ)) inductively by:

AVal((α)) = {(ε, ε, α)} AVal((M,θ) ::L) = {((c, a), γ · [c → (M,θ)], α) | c ∈ CN, ( a, γ, α) ∈ AVal(L)} AVal((θ, ∀X) ::L) = {((β, a), γ · [β → (θ, U)], α) | β ∈ TN, ( a, γ, α) ∈ AVal(L{β/X})}

Both φ and γ are finite partial functions whose domains are sets of names. For such maps, the extension notation we used e.g. in φ · [c → z] (for appropriate z) means *fresh* extension: φ · [c → z] = φ ∪ {(c, z)} and given that c /∈ dom(φ). This notation is extended to whole maps: e.g. φ · φ = φ ∪ φ and given that dom(φ) ∩ dom(φ ) = ∅. Moreover, for each map γ we write fst(γ) for its first projection: fst(γ) = {(a, <sup>M</sup><sup>ˆ</sup> ) <sup>|</sup> <sup>γ</sup>(a)=(M, <sup>ˆ</sup> )}. Similarly, second projection is given by: snd(γ) = {(a, Z) <sup>|</sup> <sup>γ</sup>(a)=( , Z)}.

**Definition 14.** <sup>A</sup> *configuration* is a triple E, γ,φ where <sup>E</sup> is an evaluation stack and γ and φ are as above. The reduction rules of the LTS are given in Fig. 3. We write Tr(C) for the set of traces generated by a configuration C. <sup>θ</sup>1,...,xm : <sup>θ</sup>m}, we set Δ; <sup>Γ</sup> <sup>M</sup> : <sup>θ</sup> <sup>=</sup> ♦, [cin → (M, -

Given a typed term <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup> : <sup>θ</sup>, with <sup>Δ</sup> <sup>=</sup> {X1,...,Xn}, <sup>Γ</sup> <sup>=</sup> {x<sup>1</sup> : θ )], ε and -

[[Δ; <sup>Γ</sup> <sup>M</sup> : <sup>θ</sup>]] = {<sup>T</sup> <sup>∈</sup> Tr(Δ; <sup>Γ</sup> <sup>M</sup> : <sup>θ</sup>) <sup>|</sup> <sup>T</sup> has at most one initial move }

where θ <sup>=</sup> <sup>∀</sup>X1.... <sup>∀</sup>Xn.θ<sup>1</sup> → ··· → <sup>θ</sup>m <sup>→</sup> <sup>θ</sup> and <sup>M</sup> <sup>=</sup> ΛX1. . . . ΛXn.λxθ<sup>1</sup> <sup>1</sup> .... λxθ*<sup>m</sup>* m .M.

A configuration is active (resp. passive) if its evaluation stack is so. An active configuration stands for a term being computed and it may only produce P-moves. A passive configuration, on the other hand, stands for a scenario where O is next to play. Moreover, the map φ in a configuration contains information on the O-names that have been played, i.e. dom(φ) contains O-names, while dom(γ) contains P-names. (M,θ)], ε and look at its traces, for some closed term <sup>M</sup> (so no need for M, -

To better grasp Fig. 3 let us consider an initial configuration ♦, [cin → θ ) with empty support.


*Example 15.* In Fig. <sup>4</sup> we include example traces for terms <sup>M</sup>1, M<sup>2</sup> : **Unit** <sup>→</sup> **Unit** (taken from [1], Instance 3.25) and for the Church numerals <sup>M</sup>k : **Nat**. The former pair is an instance of Theorem 21 – Strachey equivalence implies trace equivalence.

In our scenario above we started from a passive configuration with empty stack and a singleton γ. A different way to produce a trace is to start from an active configuration with a stack containing only a term <sup>E</sup>[cinMˆ<sup>1</sup> ··· <sup>M</sup>ˆn], in which case the rule (PQ0) would commence the trace. More generally, we call a configuration C with stack E:

$$\begin{aligned} M\_k &= \ell X. \lambda f^{X \to X}. \lambda x^X. N\_{f,x,k} & N\_{M\_f, M\_x, k} &= \underbrace{M\_f \{M\_f \{\dots \{M\_f \mid M\_x\}\}\dots\}}\_{k} \end{aligned}$$

**Fig. 4.** Top: traces for two terms of type **Unit**→**Unit**. Bottom: traces for Church numeral Mk.


Each reduction sequence in the LTS can only contain either term or context configurations. In our discussion above and in Example 15 we examine the semantics of terms, and therefore use term configurations. In later sections, when we shall start looking at the semantics of contexts, we will be using context configurations as well.

While we have not defined leaves for our LTS, there is a natural notion of a trace being "completed". In particular, we call a trace T *complete* if all its questions have been answered. We write CTr(C) for the set of complete traces generated from C. Term and context configurations can both produce complete traces. Given a term configuration <sup>C</sup> and a complete trace <sup>T</sup>, we write <sup>C</sup> ⇓T if C <sup>T</sup> −→ C and C has an empty evaluation stack. On the other hand, given a context configuration <sup>C</sup>, a complete trace <sup>T</sup> and a value <sup>v</sup>, we write <sup>C</sup> ⇓T ,v if C T −→ C and C has an evaluation stack with a single element (v, θ).

**Lemma 16.** *Given a term configuration* <sup>C</sup> *and* <sup>T</sup> <sup>∈</sup> Tr(C)*, then* <sup>T</sup> *is complete iff* <sup>C</sup> ⇓T *.*

We conclude this section by looking at some restrictions characterising actual configurations. We first extend fst to evaluation stacks by: fst(♦) = ♦ and fst((Z, ) :: <sup>E</sup>) = <sup>Z</sup> :: fst(E).

**Definition 17.** A configuration E, γ,φ is said to be *legal* when: 


where <sup>Δ</sup>φ = dom(φ) <sup>∩</sup> TN and <sup>Γ</sup>φ,γ <sup>=</sup> {(x, θ{fst(γ)}) <sup>|</sup> (x, θ) <sup>∈</sup> <sup>φ</sup>}.

**Lemma 18.** *If* C *is a legal configuration and* C <sup>m</sup> −→ C *then* C *is a legal configuration.*

### **4 Parametricity in the Trace Model, and Proof of Theorem 5**

We next examine the relationship between trace equivalence and the notions of Reynolds and Strachey equivalence. We prove that Strachey equivalence is included in trace equivalence (Theorem 21), which in turn is included in Reynolds equivalence (Theorem 28).

### **4.1 From Strachey to Trace Equivalence**

**Definition 19.** Let <sup>C</sup>i <sup>=</sup> Ei, γi, φi, for <sup>i</sup> = 1, 2, be two configurations. We say that C<sup>1</sup> and C<sup>2</sup> are *Strachey-equivalent* when E<sup>1</sup> and E<sup>2</sup> have the same size, dom(γ1) = dom(γ2), φ<sup>1</sup> = φ<sup>2</sup> and:


where <sup>E</sup><sup>1</sup> <sup>=</sup>βη <sup>E</sup><sup>2</sup> just if <sup>E</sup>1[x] =βη <sup>E</sup>2[x] for some/all fresh <sup>x</sup>.

The first inclusion can then be proven as follows.

**Lemma 20.** *Given two Strachey-equivalent legal configurations* C1, C2*, if* C<sup>1</sup> m −→ C <sup>1</sup> *for some* m, C <sup>1</sup> *then there is* C<sup>2</sup> m −→ C <sup>2</sup> *such that* C <sup>1</sup> *and* C <sup>2</sup> *are Stracheyequivalent.*

**Theorem 21.** *For all Strachey-equivalent* Δ, Γ <sup>M</sup>1, M<sup>2</sup> : <sup>θ</sup>*, we have* [[M1]] = [[M2]]*.*

*Proof.* Taking T ∈ [[Δ; Γ M<sup>1</sup> : θ]], we prove that T ∈ [[Δ; Γ M<sup>2</sup> : θ]] by induction on the length of T, using the previous lemma.

The inclusion above is strict. This is shown, for example, by the following terms <sup>M</sup>**true**, M**false** : **Unit** <sup>→</sup> **Unit**, which are trace equivalent but not Strachey-equivalent:

$$M\_{\mathbf{b}} = \lambda f^{\mathbf{Unit}}.\Lambda X.\lambda x^X.\mathsf{snd}(f(\mathbf{Bool} \times X) \langle \mathbf{b}, x \rangle) \quad (\mathbf{b} = \mathbf{true}, \mathbf{false})$$

Here we use the impredicative encoding of product types [8]: θ<sup>1</sup> × θ<sup>2</sup> = <sup>∀</sup>X.(θ<sup>1</sup> <sup>→</sup> <sup>θ</sup><sup>2</sup> <sup>→</sup> <sup>X</sup>) <sup>→</sup> <sup>X</sup>, M,N <sup>=</sup> ΛX.λf <sup>θ</sup>1→θ2→X.fMN and **snd** <sup>=</sup> λxθ1×θ<sup>2</sup> .xθ2(λyθ<sup>1</sup> .λzθ<sup>2</sup> .z). Setting <sup>γ</sup><sup>0</sup> = [cin → (M**b**, **Unit** <sup>→</sup> **Unit**)] and <sup>C</sup>**<sup>b</sup>** <sup>=</sup> ·; · <sup>M</sup>**<sup>b</sup>** : **Unit** <sup>→</sup> **Unit**, we have:

C**<sup>b</sup>** cin(c*<sup>f</sup>* ,α,c) −−−−−−−−→ (**snd**(c<sup>f</sup> (**Bool** <sup>×</sup> <sup>α</sup>)**b**, c), α), γ0, φ0 (φ<sup>0</sup> = [c<sup>f</sup> → **Unit**, α → U, c → <sup>α</sup>]) c¯*<sup>f</sup>* (β,c-) −−−−−−→ (**snd**•, β <sup>α</sup>), γ1, φ0 (γ<sup>1</sup> <sup>=</sup> <sup>γ</sup><sup>0</sup> · [<sup>β</sup> → (**Bool** <sup>×</sup> α, <sup>U</sup>), c → (**b**, c, β)]) c-() −−→ (**b**, c, β) :: (**snd**•, β <sup>α</sup>), γ1, φ0 OKOK −−−→ (**sndb**, c, α), γ1, φ0 −−−→ (c, α), γ1, φ0 <sup>c</sup>¯() −−→ (•, α <sup>α</sup>), γ1, φ0 OKOK −−−→ , γ1, φ0

and this is the only complete trace in [[M**b**]]. Indeed, O cannot interrogate another name, as cin can only be played once, and c cannot be played with the (OQ0) rule.

The other inclusion (trace included in Reynolds) is more challenging and requires us to introduce machinery for relating the semantics of terms and semantics of contexts to that of terms and contexts composed.

#### **4.2 Composite LTS**

We let a *composite configuration* be a tuple EP , <sup>E</sup>O, γP , γO, where <sup>γ</sup>P and <sup>γ</sup>O are maps <sup>γ</sup> as above, <sup>E</sup>P is a term evaluation stack, and <sup>E</sup>O is a context evaluation stack. These configurations represent the interaction between a term

#### **Fig. 5.** Composite LTS.

and a context. The term-part in the interaction is played by <sup>E</sup>P and <sup>γ</sup>P , while the context-part by <sup>E</sup>O and <sup>γ</sup>O. As with ordinary configurations, we define an LTS for composite ones in Fig. 5. Given a composite configuration C, a trace T and a value <sup>v</sup> (hnf with empty support) we write <sup>C</sup> ⇓T ,v when <sup>C</sup> <sup>T</sup> −→ ♦, [(v, θ)], γP , γO.

Composite configurations allow us to compose a term and a context semantically: we essentially play the traces of one against the other. Another way to obtain a composite semantics is to work syntactically, i.e. by composing configurations and then executing the resulting term. This is defined next.

**Definition 22.** Given two evaluation stacks (EP , <sup>E</sup>O), we build their *merge* (which may not always be defined) <sup>E</sup>P ||EO inductively by ♦||[(M,θ)] = <sup>M</sup> and:

$$\begin{array}{l} ((M,\alpha)::\mathcal{E}\_P) || ((E,\alpha\leadsto\theta)::\mathcal{E}\_O) = \mathcal{E}\_P || ((E[M],\theta)::\mathcal{E}\_O) \\\ ((E,\alpha\leadsto\theta)::\mathcal{E}\_P) || ((M,\alpha)::\mathcal{E}\_O) = ((E[M],\theta)::\mathcal{E}\_P) || \mathcal{E}\_O \end{array}$$

When it is defined, we say that <sup>E</sup>P , <sup>E</sup>O are *compatible*. Then, a composite configuration <sup>C</sup> <sup>=</sup> EP , <sup>E</sup>O, γP , γO is *legal* when (EP , <sup>E</sup>O) are compatible and when both EP , γP ,snd(γO) and EO, γO,snd(γP ) are legal.

We now relate the reduction of a composite configuration with the head reduction of the merge of its two evaluation stacks. First, taking the two environments <sup>γ</sup>P , γO of a legal composite configuration, we compute their *closure* (γP · <sup>γ</sup>O)<sup>∗</sup> as follows. Setting <sup>γ</sup><sup>0</sup> <sup>=</sup> fst(γP · <sup>γ</sup>O), and <sup>γ</sup><sup>i</sup> <sup>=</sup> {(a, <sup>M</sup><sup>ˆ</sup> {γ}) <sup>|</sup> (a, <sup>M</sup><sup>ˆ</sup> ) <sup>∈</sup> <sup>γ</sup>i−<sup>1</sup>} (i > 0), there is an integer <sup>n</sup> such that <sup>ν</sup>(cod(γn)) = <sup>∅</sup>. We write (γP ·γO)<sup>∗</sup> for the environment defined as γn, for the least n satisfying this latter condition.

**Theorem 23.** *Given a legal composite configuration* <sup>C</sup> <sup>=</sup> EP , <sup>E</sup>O, γP , γO*, then* <sup>C</sup> ⇓T ,v *iff* (EP ||EO){(γP · <sup>γ</sup>O)<sup>∗</sup>} →<sup>∗</sup> <sup>v</sup>*.*

Finally, we relate the LTS's for composite configurations and ordinary configurations (Theorem 26). Combined with Theorem 23, this gives us a correlation between the traces of two compatible configurations and the head reduction we obtain once we merge their evaluation stacks.

**Definition 24.** Given legal configurations <sup>C</sup>P <sup>=</sup> EP , γP , φP and <sup>C</sup>O <sup>=</sup> EO, γO, φO, we say that they are *compatible* when <sup>E</sup>P , <sup>E</sup>O are compatible, snd(γP ) = <sup>φ</sup>O and snd(γO) = <sup>φ</sup>P . For each pair (CP , CO) of compatible configurations, we define their merge <sup>C</sup>P ∧∧ <sup>C</sup>O as the composite configuration EP , <sup>E</sup>O, γP , γO.

**Lemma 25.** *Taking* (CP , CO) *a pair of compatible configurations,* <sup>C</sup>P ∧∧CO ⇓T ,v *iff* <sup>C</sup><sup>P</sup> ⇓<sup>T</sup> *and* <sup>C</sup><sup>O</sup> ⇓T <sup>⊥</sup>,v*.*

**Theorem 26.** *Given* <sup>C</sup>P,1, CP,2, CO *such that* <sup>C</sup>P,1, CO *and* <sup>C</sup>P,2, CO *are pairwise compatible and* Tr(CP,<sup>1</sup>) = Tr(CP,<sup>2</sup>)*, if* <sup>C</sup>P,<sup>1</sup>∧∧CO ⇓T ,v*, then* <sup>C</sup>P,<sup>2</sup>∧∧CO ⇓T ,v*.*

*Proof.* From Lemma <sup>25</sup> we get <sup>C</sup>P,<sup>1</sup> ⇓<sup>T</sup> and <sup>C</sup><sup>O</sup> ⇓T <sup>⊥</sup>,v. Thus, <sup>T</sup> <sup>∈</sup> Tr(CP,<sup>1</sup>) and hence <sup>T</sup> <sup>∈</sup> Tr(CP,<sup>2</sup>). Lemma <sup>16</sup> then yields <sup>C</sup>P,<sup>2</sup> ⇓T and, from Lemma 25, <sup>C</sup>P,<sup>2</sup>∧∧CO ⇓T ,v.

#### **4.3 Proof of Theorem 5**

Theorem 5 follows from Theorems 21 and 28. Theorem 28, which is proved below, shows that any trace equivalent terms are also Reynolds equivalent. This is achieved as follows. In the previous section we saw how to relate reductions of terms-in-context to the semantics of terms and contexts. Given terms M1, M<sup>2</sup> which are trace equivalent, and fully applying them to related arguments, we obtain head reductions to values. These reductions can be decomposed into LTS reductions producing corresponding traces, for the terms and their argument terms (which form contexts). But, since the terms are trace equivalent, M<sup>2</sup> can simulate the behaviour of M<sup>1</sup> in the context of M1, and that allows us to show that the two composites reduce to the same value.

We start by extending logical relations to extended types with empty support. We define <sup>R</sup>[[ext(θ)]]δ by:

$$\begin{array}{l} \mathcal{R}[(X)]\_{\delta} = \{ R \mid \delta(X) = (\square, \square, R) \} \\ \mathcal{R}[\theta :: L]\_{\delta} = \{ (M\_1, N\_1) :: L' \mid (M\_1, N\_1) \in \mathcal{R}[\theta]\_{\delta} \land L' \in \mathcal{R}[L]\_{\delta} \} \\ \mathcal{R}[\forall X :: L]\_{\delta} = \{ (\theta\_1, \theta\_2) :: L' \mid (\theta\_1, \theta\_2, R) \in \text{Rel} \land L' \in \mathcal{R}[L]\_{\delta} \} \end{array}$$

**Lemma 27.** (M1, M2) ∈ R[[θ]]δ *iff for all* ((N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> , <sup>N</sup><sup>ˆ</sup> <sup>1</sup> <sup>2</sup> ),...,(N<sup>ˆ</sup> <sup>n</sup> <sup>1</sup> , <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>2</sup> ), R) ∈ <sup>R</sup>[[ext(θ)]]δ*,* (M1N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>1</sup> , M2N<sup>ˆ</sup> <sup>1</sup> <sup>2</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>2</sup> ) ∈ R*.*

**Theorem 28.** *For all trace equivalent* <sup>Δ</sup>; <sup>Γ</sup> <sup>M</sup>1, M<sup>2</sup> : <sup>θ</sup>*, we have that* <sup>M</sup><sup>1</sup> log <sup>M</sup>2*.*

*Proof.* Taking <sup>δ</sup> ∈ R[[Δ]] and (η1, η2) ∈ R[[Γ]]δ, we show (M1{η1}{δ1}, <sup>M</sup>2{η2}{δ2}) ∈ R[[θ]]δ. Using Lemma 27, we take ((N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> , <sup>N</sup><sup>ˆ</sup> <sup>1</sup> <sup>2</sup> ),...,(N<sup>ˆ</sup> <sup>n</sup> <sup>1</sup> , <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>2</sup> ), R) ∈ <sup>R</sup>[[ext(θ)]]δ, and prove that (M1{η1}{δ1}N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>1</sup> , M2{η2}{δ2}N<sup>ˆ</sup> <sup>1</sup> <sup>2</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>2</sup> ) ∈ R. For each <sup>i</sup> ∈ {1, <sup>2</sup>}, there exists a value <sup>v</sup>i s.t. <sup>M</sup>i{ηi}{δi}N<sup>ˆ</sup> <sup>1</sup> i ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> i <sup>→</sup><sup>∗</sup> <sup>v</sup>i. Using the closure of <sup>R</sup> w.r.t. =βη, it suffices to show that (v1, v2) <sup>∈</sup>

<sup>R</sup>. Suppose <sup>Δ</sup> <sup>=</sup> <sup>X</sup>1,...,Xk and <sup>Γ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>θ</sup>1,...,xm : <sup>θ</sup>m. We write

<sup>C</sup>P*<sup>i</sup>* for the configuration Δ; <sup>Γ</sup> <sup>M</sup><sup>i</sup> : <sup>θ</sup>, and <sup>C</sup>O,i for the configuration cinδi(X1)··· <sup>δ</sup>i(Xk)ηi(x1)··· <sup>η</sup>i(xm)N<sup>ˆ</sup> <sup>1</sup> i ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> i , ε, [cin → <sup>θ</sup> ], where θ = <sup>∀</sup>X1.... <sup>∀</sup>Xn.θ<sup>1</sup> →···→ <sup>θ</sup>m <sup>→</sup> <sup>θ</sup>.

From Theorem 23, for each <sup>i</sup> ∈ {1, <sup>2</sup>} there is a trace <sup>T</sup>i such that <sup>C</sup>P,i ∧∧ <sup>C</sup>O,i ⇓T*i*,v*<sup>i</sup>* . <sup>M</sup>1, M<sup>2</sup> being trace equivalent, we have that Tr(CP,<sup>1</sup>) = Tr(CP,<sup>2</sup>). So from Theorem 26, we get that <sup>C</sup>P,<sup>2</sup> ∧∧ <sup>C</sup>O,<sup>1</sup> ⇓T1,v<sup>1</sup> , and from Theorem <sup>23</sup> that <sup>M</sup>2{η1}{δ1}N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>1</sup> →<sup>∗</sup> v1. Finally, from Theorem 2, we get that (M2{η1}{δ1}N<sup>ˆ</sup> <sup>1</sup> <sup>1</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>1</sup> , M2{η2}{δ2}N<sup>ˆ</sup> <sup>1</sup> <sup>2</sup> ··· <sup>N</sup><sup>ˆ</sup> <sup>n</sup> <sup>2</sup> ) ∈ R. Thus, using the closure of <sup>R</sup> w.r.t. =βη, we have that (v1, v2) <sup>∈</sup> <sup>R</sup>.

### **5 Related and Future Work**

The literature on parametric polymorphism is vast; here we look at the works closest to ours, which come from the game semantics area. The first game model for System F was introduced by Hughes [9,10]. The model is intentional, in the sense that it is fully complete for βη-equivalence. Starting from that model, de Lataillade [5,6] characterised parametricity categorically via the notion of dinaturality [4]. In [2], Abramsky and Jagadeesan developed a model for System F to characterise genericity, as introduced by Longo et al. [17]. A type θ is said to be *generic* when two terms M1, M<sup>2</sup> of type ∀X.θ are equivalent just if M1θ and M2θ are equivalent. Their model contains several generic types. More recently, Laird [15] has introduced a game model for System F augmented with mutable variables. His model is closer to ours than the previous ones, and in particular his notion of copycat links can be seen as connected to the use of names for parametricity.

In all of the above models the denotation of terms is built compositionally by induction on the structure of the term. In a different line of work, closer in spirit to our model, Lassen and Levy [16] have introduced normal form bisimulations for a language with parametric polymorphism. These bisimulations are defined on LTSs whose definition has similarities with ours. However, the model is for a CPS-style language which has not only polymorphic but also recursive types. Finally, our own model for a higher-order polymorphic language with general references [13] can be seen as a direct precursor to this work, albeit in a very different setting (call-by-value, with references).

Further on, we would like to study the existence of generic types in our model, as well as its dinaturality properties. We would moreover like to examine coarser notions of trace equivalence that bring us closer to Reynolds polymorphism. Finally, we would like to see if the trace model can be used to prove the original conjecture of [1,20]. While this seems plausible in principle, proving equivalences using definable logical relations requires additional tools, such as restrictions on the LTS, to avoid circular reasoning.

**Acknowledgement.** Authors supported by the LABEX MILYON (ANR-10-LABX-0070) of Universit´e de Lyon, and the EPSRC (EP/P004172/1) respectively.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Categorical Combinatorics for Non Deterministic Strategies on Simple Games**

Cl´ement Jacq(B) and Paul-Andr´e Melli`es

Institut de Recherche en Informatique Fondamentale, Universit´e Paris Diderot, Paris, France Clement.Jacq@irif.fr

**Abstract.** The purpose of this paper is to define in a clean and conceptual way a non-deterministic and sheaf-theoretic variant of the category of simple games and deterministic strategies. One thus starts by associating to every simple game a presheaf category of non-deterministic strategies. The bicategory of simple games and non-deterministic strategies is then obtained by a construction inspired by the recent work by Melli`es and Zeilberger on type refinement systems. We show that the resulting bicategory is symmetric monoidal closed and cartesian. We also define a 2-comonad which adapts the Curien-Lamarche exponential modality of linear logic to the 2-dimensional and non deterministic framework. We conclude by discussing in what sense the bicategory of simple games defines a model of non deterministic intuitionistic linear logic.

### **1 Introduction**

A new generation of 2-categorical and sheaf-theoretic game semantics is currently emerging in the field of programming language semantics. The games and strategies which determine them are more sophisticated mathematically, and also more difficult to define rigorously, than they were in the deterministic case. For that reason, it is timely to examine more closely the 2-categorical and sheaf-theoretic frameworks available to us in order to formulate these games and strategies in a suitably clean and conceptual way. In this investigation, one benefits from the efforts made in the past twenty-five years to give a clearer mathematical status to the previous generation of game semantics, which was (to a large extent) based on the notion of arena game. We recognize three main lines of work here:


3. the concurrent and asynchronous approach advocated by Melli`es, based on the description of arena games as asynchronous games, and of strategies as causal concurrent structures playing on them, either in an alternated [9–11] or in a non-alternated way [18].

Interestingly, all the sheaf-theoretic frameworks designed for game semantics today are offsprings of the third approach based on asynchronous games: on the one hand, the notion of concurrent strategy in [19] is a sheaf-theoretic transcription of the notion of receptive ingenuous strategy formulated in [18]; on the other hand, the sheaf-theoretic notion of non-deterministic innocent strategy in [13,17] relies on the diagrammatic and local definition of innocence in alternated asynchronous games [11]. For that reason, our purpose in this paper is to investigate the connection with the second approach, different in spirit and design, and to define a bicategory of simple games and non-deterministic strategies in the sheaf-theoretic style of Harmer et al. [4]. As we will see, our work also integrates a number of elements coming from the first approach, and more specifically, the discovery by Melli`es that strategies are presented by generators and relations, and for that reason, are prone to factorisation theorems [14,15]. Since we are interested in sheaf-theoretic models of computations, we should not forget to mention the pioneering work by Hirschowitz and Pous on models of process calculi [5], and its recent connection to game semantics [2].

In the present paper, we start from the category G of simple games and deterministic strategies between them, and we explain how to turn G into a bicategory S of simple games and *non-deterministic* strategies. As we will see, the construction of S relies on the discovery of a number of elementary but fundamental fibrational properties of the original category G. Since our work is built on [4], let us recall that a simple game A is defined there as a contravariant presheaf <sup>A</sup> : <sup>ω</sup>op <sup>→</sup> **Set** over the order category <sup>ω</sup> <sup>=</sup> <sup>0</sup> <sup>→</sup> <sup>1</sup> <sup>→</sup> <sup>2</sup> →··· associated to the infinite countable ordinal ω. A simple game A is thus a family of sets A<sup>n</sup> together with a function <sup>π</sup><sup>n</sup> : <sup>A</sup><sup>n</sup>+1 <sup>→</sup> <sup>A</sup><sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, depicted as:

$$A\_0 \xleftarrow{\pi\_0} A\_1 \xleftarrow{\pi\_1} A\_2 \xleftarrow{\pi\_1} \dots \xleftarrow{\cdots} \xleftarrow{\longleftarrow} A\_n \xleftarrow{\pi\_n} A\_{n+1} \xleftarrow{\longleftarrow} \cdots$$

One requires moreover that A<sup>0</sup> is the singleton set. The intuition is that A is a rooted tree; that A<sup>n</sup> contains its plays (or branches) of length n; and that π<sup>n</sup> is the prefix function which transports every play of length n + 1 to its prefix of length n. In particular, every simple game A contains only one play of length 0, which should be thought as the empty play. Every simple game A should be moreover understood as alternating: here, the intuition is that every play of odd length 2n + 1 ends with an Opponent move, and that every play of even length 2n ends with a Player move if n > 0.

**Terminology:** An element <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>n</sup> is called a position of degree <sup>n</sup> in the game <sup>A</sup>. The position a ∈ A<sup>n</sup> is called a P-position when its degree n is even, and a Oposition when its degree n is odd. Given a position a ∈ A<sup>n</sup>+1, we write π(a) for the position <sup>π</sup>n(a); similarly, given a position <sup>a</sup> <sup>∈</sup> <sup>A</sup>n+2, we write <sup>π</sup><sup>2</sup>(a) for the position π<sup>n</sup> ◦πn+1(a). A simple game A is called O-branching when the function <sup>π</sup> : <sup>A</sup>2n+2 <sup>→</sup> <sup>A</sup>2n+1 is injective, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. This means that every Opponent position a ∈ A2n+1 can be extended in at most one way into a Player position <sup>b</sup> <sup>∈</sup> <sup>A</sup>2n+2, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

We start the paper by formulating a sheaf-theoretic notion of nondeterministic P-strategy on a simple game A. Recall that a deterministic Pstrategy σ of a simple game A is defined in [4] as a family of subsets σ2<sup>n</sup> ⊆ A2<sup>n</sup> of <sup>P</sup>-positions, satisfying the following properties, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>:


In order to generalize this definition to non-deterministic P-strategies, we find convenient to consider the full subcategory ω<sup>P</sup> of ω consisting of the strictly positive even numbers, of the form 2n for n > 0; and the inclusion functor ι<sup>P</sup> : ω<sup>P</sup> → ω. Define the presheaf A<sup>P</sup> = A ◦ ι<sup>P</sup> as the simple game A obtained as the restriction of the presheaf <sup>A</sup> : <sup>ω</sup>op <sup>→</sup> **Set** to the subcategory <sup>ω</sup><sup>P</sup> :

$$A\_P \qquad = \quad \quad \omega\_P^{op} \xrightarrow{\iota\_P} \omega^{op} \xrightarrow{A} \mathbf{Set}$$

The collection A<sup>P</sup> thus consists of all the Player positions in A, except for the initial one ∗ ∈ A(0). This leads us to the following definition of (nondeterministic) P-strategy on a simple game A:

**Definition 1.** *A* P*-strategy* σ *on a simple game* A *is a presheaf* S : ωop <sup>P</sup> <sup>→</sup> **Set** *over the category* ω<sup>P</sup> *together with a morphism of presheaves* σ : S → A<sup>P</sup> *. We write* σ : A *in that case. The presheaf* S *is called the* support *of the strategy* σ *and the elements of* S2<sup>n</sup> *are called the runs of degree* 2n *of the strategy, for* n ≥ 0*.*

In other words, a P-strategy σ on A is a family of sets S2<sup>n</sup> indexed by strictly positive numbers n > 0, related between them by functions (π<sup>P</sup> )2<sup>n</sup> : S2n+2 → S<sup>n</sup> pictured as:

$$S\_2 \xleftarrow{\pi\_P} S\_4 \xleftarrow{} \dots \xleftarrow{} \dots \xleftarrow{} S\_{2n} \xleftarrow{\pi\_P} S\_{2n+2} \xleftarrow{} \dots \xleftarrow{} \dots$$

together with a family of functions σ2<sup>n</sup> : S2<sup>n</sup> → A2<sup>n</sup> making the diagram below commute, for all n > 0:

To every simple game A, we associate the category P(A) of P-strategies over A, defined as the slice category

$$\mathcal{P}(A) = \left[ \omega\_P^{op}, \mathbf{Set} \right] / A\_P \tag{1}$$

whose objects are thus the strategies over A, and whose morphisms θ : σ → τ between two strategies σ : S → A and τ : T → A are the morphisms θ : S → T of presheaves satisfying the expected equation: σ = τ ◦ θ. We will call those simulations. One main contribution of the paper is the observation that the family of categories P(A) can be organised into a pseudofunctor

$$
\mathcal{P}: \mathfrak{g} \longrightarrow \mathbf{Cat}
$$

from the category G of simple games and deterministic strategies. The pseudofunctor P is moreover monoidal, in the sense that there exists a family of functors

$$m\_{A,B}: \mathcal{P}(A) \times \mathcal{P}(B) \longrightarrow \mathcal{P}(A \otimes B)$$

indexed by simple games A, B. As a symmetric monoidal closed category, the category G is enriched over itself, with the simple game G(A, B) = A - B constructed from the simple games A and B. Here comes the nice point of the construction: the bicategory S is simply defined as the bicategory with simple games A, B as objects, and with

$$\mathcal{S}(A,B) = \mathcal{P}(A \multimap B)$$

as category of morphisms between two simple games A and B. In other words, a morphism σ : A → B in S is a P-strategy σ : A - B, and a 2-cell θ : σ ⇒ τ : A → B is a morphism θ : σ → τ in the category P(A - B). At this point, the fact that S defines a bicategory is easily derived from the lax monoidal structure of the pseudofunctor P. Recall that, as a symmetric monoidal closed category, the category G is enriched over itself. From a conceptual point of view, the construction of the bicategory S thus amounts to a change of enrichment category along the lax monoidal pseudofunctor <sup>P</sup> : <sup>G</sup> <sup>→</sup> **Cat**, transforming the G-enriched category G into the (weak) **Cat**-enriched category S.

Besides the construction of S, a great care will be devoted to the analysis of the Curien-Lamarche exponential comonad ! on the category G and to the recipe to turn it into an exponential 2-comonad on the bicategory S. The construction relies on the existence of a family of functors

$$p\_A \quad : \quad \mathcal{P}(A) \longrightarrow \mathcal{P}(!A),$$

called "promotion" functors, and natural in the simple game A in the category G. In particular, the functorial part of the exponential 2-comonad ! : S → S is defined as the composite:

$$\mathcal{P}(A \multimap B) \xrightarrow{p\_{A \multimap B}} \mathcal{P}(!(A \multimap B)) \xrightarrow{\mathcal{P}(n\_{A,B})} \mathcal{P}(!A \multimap! B)$$

where nA,B : !(A - B) → !A - !B is the canonical morphism in G which provides the structure of a lax monoidal functor to the original comonad ! : G → G.

### **2 Non-deterministic** *P* **-strategies as** *P* **-cartesian Transductions**

As explained in the introduction, a P-strategy σ ∈ P(A) over a simple game A is defined as an object of the slice category (1) in the category [ωop <sup>P</sup> , **Set**] of contravariant presheaves over ω<sup>P</sup> . We will use the fact that the slice category is equivalent to the category of contravariant presheaves

$$\mathcal{P}(A) \;= \; \left[ \omega\_P^{op}, \mathbf{Set} \right] / A\_P \; \cong \; \left[ \mathbf{tree} (A\_P)^{op}, \mathbf{Set} \right]$$

over the Grothendieck category **tree**(A<sup>P</sup> ) generated by the presheaf <sup>A</sup><sup>P</sup> <sup>∈</sup> [ωop <sup>P</sup> , **Set**]. The category **tree**(A<sup>P</sup> ) has the <sup>P</sup>-positions of the simple game <sup>A</sup> as objects, and a morphism a → a between a ∈ A2<sup>p</sup> and a ∈ A2<sup>q</sup> precisely when <sup>p</sup> <sup>≤</sup> <sup>q</sup> and <sup>π</sup>2q−2<sup>p</sup>(a ) = a. In other words, it is the order category associated to the tree of P-positions of the simple game A.

We find convenient for later purposes to reformulate non-deterministic Pstrategies in the following way. This paves the way to a comprehension theorem for the pseudofunctor P, which will be established in the next section. A transduction <sup>θ</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> between two simple games A, B : <sup>ω</sup>op <sup>→</sup> **Set** is defined as a natural transformation between the presheaves A and B, given by a family of functions θ<sup>n</sup> : A<sup>n</sup> → B<sup>n</sup> making the square <sup>n</sup> diagram below commute, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>:

A transduction θ : A → B is called P-cartesian when 2<sup>n</sup> is a pullback square for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>; and <sup>O</sup>-cartesian when 2n+1 is a pullback square for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We write <sup>T</sup> for the category of simple games and transductions between them, and T<sup>P</sup> (resp. TO) for the subcategory of P-cartesian (resp. O-cartesian) transductions. Note that the restriction functor

$$(-)\_P \quad : \quad [\omega^{op}, \mathbf{Set}] \quad \longrightarrow \quad [\omega^{op}\_P, \mathbf{Set}]$$

is a fibration, and that a transduction θ : A → B between simple games is P-cartesian precisely when it defines a cartesian morphism with respect to the fibration (−)<sup>P</sup> . For that reason, a P-cartesian transduction θ : A → B is entirely characterized by the family of functions θ2<sup>n</sup> : A2<sup>n</sup> −→ B2<sup>n</sup> on the P-positions of the simple games <sup>A</sup> and <sup>B</sup>, for <sup>n</sup> <sup>∈</sup> <sup>N</sup>. From this follows easily that

**Proposition 1.** *A* P*-strategy* σ *on a simple game* A *is the same thing as a simple game* S *together with a* P*-cartesian transduction* S → A*. The simple game* S *is uniquely determined by* σ *up to isomorphism. It is called the support (or run-tree) of* σ*, and noted* {A | σ}*, while the* P*-cartesian transduction is noted* supp <sup>σ</sup> : {<sup>A</sup> <sup>|</sup> <sup>σ</sup>} −→ <sup>A</sup>*.*

Note that the definition applies the general principle formulated in [18] that a strategy σ of a game A is a specific kind of map (here a P-cartesian transduction) S → A from a given game S = {A | σ} to the game A of interest. One benefit of this principle is that it unifies the two concepts of game and of strategy, by regarding a strategy σ of a game A as a game S "embedded" in an appropriate way by S → A inside the simple game A. This insight coming from [18] underlies for instance the construction in [19] of a category of non-deterministic strategies between asynchronous games.

Typically, consider the simple game A = B<sup>1</sup> - B<sup>2</sup> where B is the simple boolean game with a unique initial Opponent move q and two Player moves tt for true and ff for false; and where the indices 1, 2 are here to indicate the component of the boolean game B. The simple game A may be represented as the decision tree below:

where the sets of positions are defined as:

$$A\_1 = \{a\} \qquad A\_2 = \{b, a\_1, a\_2\} \qquad A\_3 = \{b\_1, b\_2\} \qquad A\_4 = \{b\_{11}, b\_{12}, b\_{21}, b\_{22}\}$$

and where the branches are induced by the prefix functions π<sup>n</sup> : A<sup>n</sup>+1 → A<sup>n</sup> depicted on the picture above. For the reader's convenience, we label every edge of A by the name of the move which would be used in the more familiar definition of simple games, where plays are defined as sequences of moves [1,6]. Note that every position a ∈ A<sup>n</sup> of degree n is determined by its occurrence, defined as the sequence of n moves from the root ∗ to the position a in the tree A. Typically, the P-position b ∈ A<sup>2</sup> has occurrence q<sup>2</sup> · q<sup>1</sup> and the P-position b<sup>21</sup> ∈ A<sup>4</sup> has occurrence <sup>q</sup><sup>2</sup> · <sup>q</sup><sup>1</sup> · tt<sup>1</sup> · ff<sup>2</sup>.

By way of illustration, we define the P-strategy σ ∈ P(A) as the presheaf below

$$\begin{array}{c} \begin{array}{c} \ast \mapsto \{ \* \} \end{array} a\_{1} \mapsto \emptyset \quad a\_{2} \mapsto \{ x'' \} \\ b \mapsto \{ x' \} \quad b\_{11} \mapsto \emptyset \quad b\_{12} \mapsto \emptyset \quad b\_{21} \mapsto \{ z' \} \quad b\_{22} \mapsto \{ z'', z'' \} \end{array}$$

on the Grothendieck category **tree**(A<sup>P</sup> ) associated to the presheaf A<sup>P</sup> of Ppositions in A. As explained in Proposition 1, the P-strategy σ may be equivalently defined as the simple game S = {A | σ} below

together with the <sup>P</sup>-cartesian transduction supp <sup>σ</sup> : {<sup>A</sup> <sup>|</sup> <sup>σ</sup>} → <sup>A</sup> described as:

x -→ a x- -→ b x-- -→ a<sup>2</sup> y -→ b<sup>1</sup> z- -→ b<sup>21</sup> z-- -→ b<sup>22</sup> z--- -→ b<sup>22</sup>

It is worth mentioning that the transduction supp <sup>σ</sup> may be recovered from the moves labelled on the run-tree S = {A | σ}. This pictorial description provides a convenient way to describe how the non-deterministic P-strategy σ plays on A. Typically, when questioned by the initial move q<sup>2</sup> of the game, the nondeterministic <sup>P</sup>-strategy <sup>σ</sup> answers tt<sup>2</sup> with the run <sup>x</sup> <sup>∈</sup> <sup>S</sup><sup>2</sup> or asks the value of the input boolean by playing the move q1; when the Opponent answers with the move tt<sup>1</sup>, the <sup>P</sup>-strategy reacts by playing the value ff<sup>2</sup> with the run <sup>z</sup> <sup>∈</sup> <sup>S</sup><sup>4</sup> or by playing the value ff<sup>2</sup> with the runs <sup>z</sup>, z <sup>∈</sup> <sup>S</sup>4. Note in particular that the P-strategy σ is allowed to play two different runs z, z ∈ S<sup>4</sup> of the same play b<sup>22</sup> ∈ A4.

### **3** *P* **-cartesian Transductions as Deterministic Strategies**

In the previous section, we have seen how to regard every non-deterministic Pstrategy <sup>σ</sup> <sup>∈</sup> <sup>P</sup>(B) as a <sup>P</sup>-cartesian transduction supp <sup>σ</sup> : {<sup>B</sup> <sup>|</sup> <sup>σ</sup>} → <sup>B</sup> into the simple game B. Our purpose here is to show that every P-cartesian transduction θ : A → B can be seen as a particular kind of deterministic strategy of the simple game A -B.

**Definition 2 (Total strategies).** *A deterministic strategy* σ *of a simple game* A *is total when for every* O*-position* s *such that the* P*-position* π(s) *is an element of* σ*, there exists a* P*-position* t *in the strategy* σ *such that* π(t) = s*.*

**Definition 3 (Back-and-forth strategies).** *Given two simple games* A *and* B*, a back-and-forth strategy* f *of the simple game* A - B *is a deterministic and total strategy whose positions are all of the form* (c, a, b) *where* c : n → n *is a copycat schedule.*

Back-and-forth strategies compose, and thus define a subcategory of G:

**Definition 4 (The category** BF**).** *The category* BF *of back-and-forth strategies is the subcategory of* G *whose objects are the simple games and whose morphisms* f : A → B *are the back-and-forth strategies of* A -B*.*

As a matter of fact, we will be particularly interested here in the subcategory BF<sup>+</sup> of *functional* back-and-forth strategies in the category BF.

**Definition 5 (Functional strategies).** *A functional strategy* f *of the simple game* A - B *is a back-and-forth strategy such that for every position* a ∈ A<sup>n</sup> *of degree* n *in the simple game* A*, there exists a unique position* b ∈ B<sup>n</sup> *of same degree in* B *such that* (c, a, b) ∈ f*, where* c : n → n *is the copycat schedule.*

The following basic observation justifies our interest in the notion of functional strategy:

**Proposition 2.** *For all simple games* A*,* B*, there is a one-to-one correspondence between the* P*-cartesian transductions* A → B *and the functional strategies in* A -B*.*

*Proof.* See AppendixE.

For that reason, we will identify P-cartesian transductions and functional strategies from now on. Put together with Proposition 1, this leads us to the following correspondence, which holds for every simple game A:

**Proposition 3.** *The category* P(A) *is equivalent to the slice category* BF+/A*.*

The result may be understood as a preliminary form of comprehension: it states that every non-deterministic P-strategy σ ∈ P(A) may be equivalently seen as a functional P-strategy

$$\mathsf{supp}\_{\sigma} \quad : \quad \{A \mid \sigma\} \quad \longrightarrow \quad A \tag{2}$$

in the category G of simple games and deterministic strategies, obtained by composing the equivalences stated in Propositions 1 and 3. Note that the simple game {A | σ} coincides with the run-tree S of the non-deterministic strategy σ formulated in Proposition <sup>1</sup> and that the functional strategy supp <sup>σ</sup> coincides with the P-cartesian transduction which "projects" the support S on the simple game A. The property (Proposition 3) is important from a methodological point of view, because it enables us to use the rich toolbox developed for simple games and deterministic strategies, in order to handle non-deterministic strategies *inside* the category G.

### **4 The Pseudofunctor P**

Suppose given a P-strategy σ ∈ P(A) over the simple game A and a morphism f : A → B in the category G.

**Definition 6.** *The* <sup>P</sup>*-strategy* <sup>P</sup>(f)(σ) <sup>∈</sup> <sup>P</sup>(B) *over the simple game* <sup>B</sup> *is defined as the contravariant presheaf over* **tree**(B<sup>P</sup> ) *which transports every* P*position* b *of the simple game* B *to the disjoint union defined below:*

$$\mathcal{P}(f)(\sigma) \quad : \quad b \quad \mapsto \coprod\_{\{e,a,b\} \in f} \sigma(a). \tag{3}$$

The fact that (3) defines a presheaf over P(B) and that P is a pseudofunctor (see Definition 24) is established in the Appendix F.

This construction equips the family of presheaf categories P(A) with the structure of a pseudofunctor <sup>P</sup> : <sup>G</sup> −→ **Cat**. Moreover, the pseudo-functor <sup>P</sup> has comprehension in the sense of Lawvere [8]. For every simple game B, the *comprehension functor* is defined as the composite

$$\{B|-\} \quad : \quad \mathcal{P}(B) \quad \longrightarrow \quad \mathcal{B} \mathcal{F}^+ / B \quad \longrightarrow \quad \mathcal{G} / B$$

which transports every non-deterministic P-strategy to the morphism (2) seen as an object of G/B. One establishes that

**Theorem 1 (Comprehension).** *For every simple game* B*, the comprehension functor*

$$\{B \mid - \} \quad : \quad \mathcal{P}(B) \quad \longrightarrow \quad \mathcal{G}/B$$

*has a left adjoint functor*

$$\mathsf{image} \quad : \quad \mathsf{G}/B \quad \longrightarrow \quad \mathcal{P}(B).$$

Given a deterministic strategy <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup>, the contravariant presheaf image(f) over the category **tree**(B<sup>P</sup> ) transports every P-position b of the game B to the set below:

$$\mathsf{image}(f) \quad : \quad b \quad \longmapsto \quad \left\{ \begin{array}{c} (e,a,b) \\ \end{array} \; \Big|\; \begin{array}{c} (e,a,b) \in f \end{array} \right\}$$

Note that the presheaf image(f) may be also described by the formula

image(f) = <sup>P</sup>(f)(∗A) <sup>∈</sup> <sup>P</sup>(B)

where ∗<sup>A</sup> is the terminal object in the category P(A) of P-strategies over A. Note that the run-tree {A | ∗A} of the P-strategy ∗<sup>A</sup> ∈ P(A) is the simple game A itself, with supp <sup>∗</sup><sup>A</sup> the identity <sup>i</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup>. In other words, the <sup>P</sup>-strategy <sup>∗</sup><sup>A</sup> has exactly one run over each position of the simple game A.

Also note that we will occasionally note positions of image(f) <sup>b</sup>(e,a) when there is need to emphasize the fact that image(f) is a contravariant presheaf over **tree**(B<sup>P</sup> ).

### **5 The Slender-Functional Factorisation Theorem**

In order to establish the comprehension theorem, we prove a factorization theorem in the original category G, which involves slender and functional strategies.

**Definition 7.** *A deterministic strategy* f *in a simple game* A - B *is slender when for every* P*-position* b *in the simple game* B*, there exists exactly one* P*position* a *of the simple game* A *and exactly one schedule* e *such that* (e, a, b) ∈ f*.*

By extension, we say that a morphism f : A → B in the category G is slender when the deterministic strategy f is slender in A - B. Note that every isomorphism f : A → B in the category G is both slender and functional.

**Proposition 4.** *Suppose that* A *and* B *are two simple games and that* f *is a deterministic strategy of* A - B*. Then, there exists a slender strategy* g : A → C *and a functional strategy* h : C → B *such that* f = h ◦ g*.*

The simple game <sup>C</sup> is defined as {<sup>B</sup> <sup>|</sup> image(f)} while the slender strategy <sup>g</sup> : A → C is defined as

$$g\_{\epsilon} = \left\{ \left. \begin{array}{c} (e, a, (e, a, b)) \\ \end{array} \right| \; (e, a, b) \in f \end{array} \right\}$$

and <sup>h</sup> : <sup>C</sup> <sup>→</sup> <sup>B</sup> is the functional strategy supp image(f) associated in Proposition <sup>3</sup> to the <sup>P</sup>-strategy image(f) <sup>∈</sup> <sup>P</sup>(B).

**Proposition 5.** *Suppose that* <sup>s</sup> : <sup>U</sup> <sup>→</sup> <sup>V</sup> *and* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *are two morphisms of the category* G*. Suppose moreover that* s *is slender and that* f *is functional. Then,* s : X → Y *is orthogonal to* f : A → B *in the sense that for all morphisms* u : X → A *and* v : Y → B *making the diagram* (a) *commute, there exists a unique morphism* h : Y → B *making the diagram* (b) *commute in the category* G*:*

The deterministic strategy h : Y → A is defined as

$$\begin{aligned} h &= \left\{ \begin{array}{l} (e,y,a) \end{array} \Big| \begin{array}{l} \exists x \in X, b \in B, e', e'' \in \Upsilon, \\\\ (e,y,b) \in v \ \wedge (c,a,b) \in f \ \wedge (e',x,y) \in s \ \wedge (e'',x,a) \in u \end{array} \right\} \\ \end{aligned}$$

$$\begin{aligned} \begin{aligned} \forall \left\{ \begin{array}{l} (e,y,a) \end{array} \Big| \begin{array}{l} \exists x \in X, b \in B, e', e'' \in \Upsilon, \\\\ (e,y,b) \in v \ \wedge (c,a,b) \in f \ \wedge (e',x,\pi y) \in s \ \wedge (e'',x,\pi a) \in u \end{array} \end{aligned} \right\}$$

Note that the position b is uniquely determined by the position a because f is functional, and that the pair (e , x) is uniquely determined by the position y because s is slender. Moreover, by determinism of u = h ◦ s, the schedule e is entirely determined by the schedules e and e .

**Theorem 2 (Factorization theorem).** *The classes* <sup>S</sup> *of slender morphisms and* F *of functional morphisms define a factorization system* (S, F) *in the category* G*.*

It is a folklore result that, in that situation, the comprehension theorem (Theorem 1) follows from the factorization theorem. The reason is that the category P(B) is equivalent (by Proposition 3) to the full subcategory BF<sup>+</sup>/B of functional strategies in the slice category G/B. Seen from that point of view, the comprehension functor {<sup>B</sup> | −} coincides with the embedding of BF<sup>+</sup>/B into G/B. It is worth noting that for every P-strategy σ ∈ P(A), one has an isomorphism

$$
\sigma \quad \cong \quad \mathsf{image}(\mathsf{supp}\_{\sigma}).
$$

in the category P(A), and that one has an isomorphism

$$\mathcal{P}(f)(\sigma) \quad \cong \quad \mathsf{image}(f \circ \mathsf{supp}\_{\sigma}) \tag{4}$$

in the category P(B), for every morphism f : A → B in the category G. This provides an alternative way to define the pseudofunctor P.

### **6 The Bicategory S of Simple Games and Non-deterministic Strategies**

In this section, we explain how to construct a bicategory S of simple games and non-deterministic strategies, starting from the category G. The first step is to equip the pseudofunctor P with a lax monoidal structure (See Definition 25), based on the definition of tensor product in the category G formulated in [4], see Appendix B for details. We start by observing that

**Proposition 6.** *Suppose given two morphisms* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *and* <sup>g</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> *in the category* G *of simple games and deterministic strategies. The morphism*

$$f \otimes g : A \otimes C \longrightarrow B \otimes D$$

*is slender when* f *and* g *are slender, and functional when* f *and* g *are functional.*

*Proof.* See Appendix G.

Note that the isomorphism image(<sup>f</sup> <sup>⊗</sup> <sup>g</sup>) <sup>∼</sup><sup>=</sup> image(f) <sup>⊗</sup> image(g) follows immediately from this statement and from the factorization theorem (Theorem 2), for every pair of morphisms f : A → B and g : C → D in the category G. The tensor product σ ⊗ τ of two P-strategies σ and τ is defined in the same spirit, using comprehension:

**Definition 8.** *Suppose that* <sup>σ</sup> <sup>∈</sup> <sup>P</sup>(A) *is a* <sup>P</sup>*-strategy of a simple game* <sup>A</sup> *and that* τ ∈ P(B) *is a* P*-strategy of a simple game* B*. The tensor product* σ ⊗ τ *is the* P*-strategy of the simple game* A ⊗ B *defined as*

$$
\sigma \otimes \tau \quad = \quad \mathsf{image}(\mathsf{supp}\_{\sigma} \otimes \mathsf{supp}\_{\tau}).
$$

Here, the morphism supp <sup>σ</sup> <sup>⊗</sup> supp <sup>τ</sup> : {<sup>A</sup> <sup>|</sup> <sup>σ</sup>}⊗{<sup>B</sup> <sup>|</sup> <sup>τ</sup>} → <sup>A</sup> <sup>⊗</sup> <sup>B</sup> denotes the tensor product (computed in the original category <sup>G</sup>) of the morphisms supp <sup>σ</sup> and supp <sup>τ</sup> . A direct description of <sup>σ</sup> <sup>⊗</sup> <sup>τ</sup> <sup>∈</sup> <sup>P</sup>(<sup>A</sup> <sup>⊗</sup> <sup>B</sup>) is also possible, as the presheaf which transports every position (e, a, b) of the simple game A ⊗ B to the set-theoretic product below:

$$
\sigma \otimes \tau \quad : \quad (e, a, b) \quad \mapsto \quad \sigma(a) \times \tau(b).
$$

As indicated in the introduction, the tensor product of P-strategies defines a family of functors mA,B : P(A) × P(B) → P(A ⊗ B) which, together with the isomorphism of categories m<sup>1</sup> : 1 → P(1), equips the pseudofunctor P with a lax monoidal structure:

**Theorem 3.** *The pseudofunctor* P *equipped with the family of functors* mA,B *and* <sup>m</sup><sup>1</sup> *defines a lax monoidal pseudofunctor from* (G, <sup>⊗</sup>, 1) *to* (**Cat**, <sup>×</sup>, 1)*.*

*Proof.* See Appendix H.

The bicategory S of simple games and non-deterministic strategies is deduced from the lax monoidal pseudofunctor P in the following generic way, inspired by the idea of monoidal refinement system [16].

**Definition 9.** *The bicategory* S *has simple games* A*,* B*,* C *as objects, with the hom-category* S(A, B) *defined as*

$$\mathcal{S}(A,B) \quad = \quad \mathcal{P}(A \multimap B),$$

*the composition functor*

$$\lnot\square\_{A,B,C} : \mathcal{P}(B \multimap C) \times \mathcal{P}(A \multimap B) \longrightarrow \mathcal{P}(A \multimap C)$$

*defined as the composite*

$$\mathcal{P}(B \multimap C) \times \mathcal{P}(A \multimap B) \xrightarrow{m\_{B \multimap C, A \multimap B}} \mathcal{P}((B \multimap C) \otimes (A \multimap B)) \xrightarrow{\mathcal{P}(comp\_{A, B, C})} \mathcal{P}(A \multimap C)$$

*where* compA,B,C : (B - C) ⊗ (A - B) −→ (A - C) *is the morphism which internalizes composition in the symmetric monoidal closed category* G*. In the same way, the identity in* P(A -A) *is defined as the composite*

$$1 \xrightarrow{m\_1} \mathcal{P}(1) \xrightarrow{\mathcal{P}(id\_{\mathcal{A}})} \mathcal{P}(A \multimap A)$$

*where the morphism* id<sup>A</sup> : 1 → (A -A) *internalizes the identity morphism in* G*.*

**Proposition 7.** *The bicategory* S *is symmetric monoidal closed in the sense that there exists a family of isomorphisms*

$$\Phi\_{A,B,C} \quad : \quad \ $(A \otimes B, C) \quad \cong \quad \$ (B, A \multimap C).$$

The isomorphism ΦA,B,C is defined as the image by the pseudofunctor P of the isomorphism

$$\begin{array}{ccccc} \varphi\_{A,B,C} & : & (A \otimes B) \multimap C & \cong & B \multimap (A \multimap C) \end{array}$$

in the category G between the underlying simple games. One benefit of our conceptual approach is that the monoidal closed structure of S is neatly deduced from the monoidal closed structure of the original category G.

### **7 The Exponentional Modality on the Category G**

Now that the monoidal bicategory S has been defined, we analyze how the exponential modality defined in [4] adapts to our sheaf-theoretic framework.

**Definition 10.** *Let* A *be a simple game.* !A *is the simple game whose set* (!A)<sup>n</sup> *of positions of degree* n *consists of the pairs* (φ, a) *such that:*


$$\{a\_k, a\_{\phi(k)}, a\_{\phi^2(k)}, \dots\}$$

*of the simple game* A*.*

*The predecessor function* π<sup>n</sup> : (!A)<sup>n</sup>+1 → (!A)<sup>n</sup> *is defined as* π(φ, a)=(φ (n), a (n)).

**Definition 11.** *Let* f *be a deterministic strategy of* A - B*. The deterministic strategy* !f *of* !A - !B *consists of the positions* (e,(φ, a),(ψ, b)) *such that* φ = e∗ψ *and, for each branch of* (φ, e, π)*, the positions associated to that branch are played by* f*.*

It is worth observing that the construction of !f : !A → !B can be decomposed in the following way. Consider the morphism

> nA,B : !(A - B) −→ !A -!B

obtained by currying the composite morphism

$$!(A \multimap B) \otimes !A \xrightarrow{\text{lax}\,\,monoidal} !((A \multimap B) \otimes A) \xrightarrow{\text{levaluation}} !B$$

in the symmetric monoidal closed category G, where we use the coercion morphism which provides the exponential modality ! : G → G with the structure of a lax monoidal functor.

**Definition 12 (**#f**).** *Given a deterministic strategy* f *of a simple game* A*, the deterministic strategy* #f *of the simple game* !A *has positions the pairs* (φ, a) *such that for each branch of* (φ, a)*, the positions associated to that branch are played by the deterministic strategy* f*.*

**Proposition 8.** *Given a morphism* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *of the category* <sup>G</sup> *and its curried form* λa.f : 1 → A -B*, the composite morphism*

$$1 \xrightarrow{\#\lambda a:A.f \atop \longrightarrow 1} !(A \multimap B) \xrightarrow{n\_{A,B}} !A \multimap !B$$

*is the curried form* λx : !A. !f *in the category* G *of the morphism* !f : !A −→ !B*.*

More details about the original exponential modality in G will be found in Appendix C. By analogy with Proposition 6, we establish that

**Proposition 9.** *Suppose that* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *is a morphism in the category* <sup>G</sup>*. Then, the morphism*

!f : !A −→ !B

*is slender when* f *is slender, and functional when* f *is functional.*

*Proof.* See Appendix I.

### **8 The Exponential Modality on the Bicategory S**

In this section, we define the linear exponential modality ! : S → S on the symmetric monoidal closed bicategory S, in order to define a bicategorical model of intuitionistic linear logic. The construction is inspired by the observation made in the previous section (Proposition 8).

**Definition 13.** *Given a* <sup>P</sup>*-strategy* <sup>σ</sup> <sup>∈</sup> <sup>P</sup>(A) *of a simple game* <sup>A</sup>*, the* <sup>P</sup>*-strategy* #σ *of the simple game* !A *is defined as the image in* P(!A) *of the morphism*

!supp <sup>σ</sup> : ! {<sup>A</sup> <sup>|</sup> <sup>σ</sup>} −→ !A.

Note that the definition of #σ induces a commutative diagram in the category G

where the top arrow is an isomorphism. Moreover, the definition of #σ coincides with the previous definition (Definition 12) when the P-strategy σ = f happens to be deterministic.Consequently, for two games A, B and a deterministic strategy f : A -<sup>B</sup>, we have image(!f) <sup>∼</sup><sup>=</sup> #Simage(f) and #S<sup>f</sup> = #f.

As mentioned in the introduction, this construction σ → #σ defines a functor

p<sup>A</sup> : P(A) −→ P(!A).

Now, remember that a morphism σ : A → B of the bicategory S is defined as a P-strategy

$$
\sigma \in \mathcal{P}(A \to \circ B).
$$

For that reason, every such morphism σ : A → B induces a P-strategy

$$\#\sigma \in \mathcal{P}(!(A \multimap B)).$$

In order to turn the P-strategy #σ into a P-strategy

$$!\sigma \in \mathcal{P}(!A \multimap !B)$$

we apply the functor

$$\mathcal{P}(n\_{A,B}) \quad : \quad \mathcal{P}(\!\!\! (A \!\!\! \sim \!\! B) \; ) \quad \longrightarrow \quad \mathcal{P}(\!\!\!\! \! (A \!\!\! \sim \!\!\! B) \; )$$

to the P-strategy #σ, where

$$n\_{A,B} \quad : \quad ! (A \multimap B) \quad \longrightarrow \quad !A \multimap !B$$

denotes the structural morphism of G defined in the previous section. The construction may be summarized as follows:

**Definition 14.** *The morphism* !<sup>σ</sup> : !<sup>A</sup> <sup>→</sup> !<sup>B</sup> *of the bicategory* <sup>S</sup> *associated to the morphism* σ : A → B *is defined as the* P*-strategy*

$$\mathcal{P}(n\_{A,B})(\#\sigma) \quad \in \quad \mathcal{P}(!A \multimap! B).$$

**Theorem 4.** *With this definition,* ! : <sup>S</sup> <sup>→</sup> <sup>S</sup> *defines a pseudofunctor from the bicategory* S *to itself.*

*Proof.* See Appendix J.

The family of morphisms

$$
\delta\_A: !A \to !! A \qquad\qquad\qquad \varepsilon\_A: !A \to A
$$

are defined with the same deterministic strategies in P(!A - !! A) and P(!A - A) as in the original category G. One checks that the families δ and ε define natural transformations between pseudonatural functors on S (as defined in Definition 26), and that the 2-functor ! : S → S defines a 2-comonad in the appropriate bicategorical sense (see Definition 27). The family of morphisms

$$d\_A: !A \to !A \otimes !A \qquad\qquad e\_A: !A \to 1$$

are defined with the same deterministic strategies in P(!A -!A⊗!A) and P(!A - 1) as in the original category G, and one checks that they define natural transformations between pseudonatural functors on S. One obtains in this way that

**Theorem 5.** *The bicategory* <sup>p</sup> *equipped with the exponential modality* ! : <sup>S</sup> <sup>→</sup> <sup>S</sup> *defines a bicategorical model of multiplicative intuitionistic linear logic.*

The formal and rigorous verification of these facts would be extremely tedious if done directly on the bicategory S of nondeterministic strategies. Our proof relies on the fact that the constructions of the model (Definitions 9, 14) are performed by "push" functors P(f) above a structural morphism f living in the original category G. The interested reader will find part of the detailed proof in Appendix K.

### **9 Conclusion**

We construct a bicategory S of simple games and non-deterministic strategies, which is symmetric monoidal closed in the extended 2-dimensional sense. We then equip the bicategory S with a linear exponential modality ! : S → S which defines a bicategorical model of intuitionistic linear logic. This provides, as far as we know, the first sheaf-theoretic and non-deterministic game semantics of intuitionistic linear logic — including, in particular, a detailed description of the exponential modality.

### **A The Category G of Simple Games and Deterministic Strategies**

We recall the construction of the category Υ of schedules performed in [4] and how we deduce from it the category G of simple games and deterministic strategies.

**Definition 15 (Schedule).** *A schedule is defined as a function* <sup>e</sup> : {1,...,n} → {0, 1} *verifying* e(1) = 1 *and* e(2k + 1) = e(2k) *whenever* 1 ≤ 2k ≤ n − 1*. The number of* 0*'s and* 1*'s in* e *are noted* |e|<sup>0</sup> *and* |e|<sup>1</sup> *respectively. A schedule* e *is noted* e : |e|<sup>0</sup> → |e|1*.*

A schedule e : p → q may be equivalently seen as a couple l : (p) → (p + q) and r : (q) → (p + q) of order-preserving and globally surjective functions, such that r(1) = 1 and

$$l(i) \text{ odd} \Rightarrow l(i+1) = l(i) + 1 \qquad \qquad r(j) \text{ even} \Rightarrow r(j+1) = r(j) + 1$$

for all 1 ≤ i ≤ p − 1 and 1 ≤ j ≤ q − 1, where (n) stands for the finite ordinal (n) = {1,...,n}.

**Definition 16.** *The category of schedules* Υ *has the natural numbers as objects, the schedules* e : p → q *as morphisms from* p *to* q*.*

The identity morphism c : p → p is the copycat schedule c characterized by the fact that c(2k + 1) = c(2k + 2) for all 1 ≤ 2k ≤ 2p. Details on the composition of two schedules e : p → r and e : r → q as a schedule e e : p → q can be found in [4]. Now, we explain how we derive the category G from the category Υ. We start by defining the simple game A -B of linear maps from A to B:

**Definition 17.** *The simple game* A - B *is defined as the set* (A - B)<sup>n</sup> *of all the triples* (e, a, b) *consisting of a schedule* e : p → q *with* p + q = n*, a position* a ∈ A<sup>p</sup> *and* b ∈ Bq*. The predecessor function* π *is defined as*

$$\pi(e,a,b) = \begin{cases} \left(e \upharpoonright \left(n-1\right), \pi(a), b\right) & if \ e(n) = 0\\ \left(e \upharpoonright \left(n-1\right), a, \pi(b)\right) & if \ e(n) = 1 \end{cases}$$

**Definition 18.** *The category* G *has simple games* A, B *as objects, and deterministic* P*-strategies* f,g *of* A - B *as morphisms from* A *to* B*. Note that we use latin letters instead of greek letters for deterministic strategies. The identity morphism* i<sup>A</sup> : A → A *is defined as the* P*-strategy of* A - A *whose positions of degree* 2n *are the triples* (c, a, a) *where* c : n → n *is the copycat schedule, and* a ∈ An*. The composite* g ◦ f : A → C *of two deterministic* P*-strategies* f : A → B *and* g : B → C *is the deterministic* P*-strategy whose set of positions of degree* 2n *is defined as*

$$(g \circ f)\_{2n} = \coprod\_{\substack{e:\,p \to r, e':\,r \to q \\ p+q=2n}} \left\{ \left. \left( e \bullet e', a, c \right) \right| \; \exists b \in B\_r \text{ } (e, a, b) \in \sigma\_{p+r} \text{ } (e', b, c) \in \tau\_{r+q} \right\} $$

### **B The Tensor Product in the Category G**

**Definition 19 (Tensorial schedule).** *<sup>A</sup>* <sup>⊗</sup>*-schedule is a function* <sup>e</sup> : {1,...,n}→{0, 1} *verifying* e(2k + 1) = e(2k + 2) *whenever* 0 ≤ 2k ≤ n − 2*.*

**Definition 20 (**<sup>A</sup> <sup>⊗</sup> <sup>B</sup>**).** *The positions of the simple game* <sup>A</sup> <sup>⊗</sup> <sup>B</sup> *of degree* <sup>n</sup> *are the triples* (e, a, b) *where* e : p ⊗ q *is a* ⊗*-schedule with* p + q = n*,* a ∈ A<sup>p</sup> *and* b ∈ Bq*. The predecessor function* π *is defined as*

$$\pi(e,a,b) = \begin{cases} (e \upharpoonright (n-1), \pi(a), b) \text{ if } e(n) = 0\\ (e \upharpoonright (n-1), a, \pi(b)) \text{ if } e(n) = 1 \end{cases}$$

*The simple game* 1 *is the simple game with a unique position* ∗*, of degree* 0*.*

We can also define ⊗ on strategies. Intuitively, for f : A → B and g : C → D two morphisms of the category G, the plays of the strategy f ⊗ g of the simple game (A ⊗ C) - (B ⊗ D) are obtained by combining through a tensorial schedule plays of f and g.

The intuition is that, once we know the structure of f and g, the structure of plays of f ⊗ g is entirely directed by what happens in B ⊗ D. The only agency that Opponent really has is to decide at some points whether to play on B or D, the rest being handled by the plays of f, g and the structure of (A ⊗ C) -(B ⊗ D). Formally, this gives the proposition:

**Proposition 10.** *Let* f : A - B,g : C - D *be two deterministic strategies. Assuming a valid play of* f ⊗ g : A ⊗ C - B ⊗ D *and the associated schedules* e : A⊗C → B ⊗D, t<sup>1</sup> : A×C, t<sup>2</sup> : B ×D, e<sup>1</sup> : A → B,e<sup>2</sup> : C → D*, the knowledge of* t2, e1, e<sup>2</sup> *is enough to reconstruct* e *and* t1*.*

*Proof.* The first O move of such a play is in B ⊗ D to follow the structure of A⊗C - B ⊗ D. This is given to us by t2. Let us assume it is a move in D (The other case is handled similarly).

The P move after that will necessarily be a move in C or D, as playing a move in A, B would break the structure of A - B,B ⊗ D respectively. e<sup>2</sup> gives us the information.


In this last case, the following O move will be a move in C as a move in A, B, D would break the structure of A - B,B ⊗ D, C - D respectively. e is then at 100 and t<sup>1</sup> at 11.

Finally, the following P move will be a move in either C or D as a move in A, B would break the structure of A - B,B ⊗ D respectively. e<sup>2</sup> gives us this information.


To sum up the described construction, once an opponent move in B or D is played, the play is stuck playing in either A - B or C - D until a player move is played in B,D respectively. t<sup>2</sup> decides whether to play the opponent move in B or D and e<sup>1</sup> guides the play in A - B in the first case, e<sup>2</sup> guides it in C - D in the second. This guides us through the whole play and allows us to reconstruct both e and t1.

In particular, any compatible plays of f, g, B ⊗ D induce a play of f ⊗ g.

This proposition and its proof are key in several proofs we will make in the rest of the paper.

**Proposition 11.** *The category* (G, <sup>⊗</sup>, <sup>1</sup>,-) *is symmetric monoidal closed.*

### **C The Exponential Modality on the Category G**

In this section, we recall the combinatorial structures introduced in [4] to construct the linear exponential comonad ! : G → G on the symmetric monoidal closed category G.

**Definition 21 (Pointer function).** *A pointer function on* n *is a parityreversing function*

φ : {1,...,n} −→ {0,...,n − 1}

*such that* φ(i) < i *for all* i*. A pointer function* φ *is called an* O*-heap if* φ(2k)=2k − 1 *for all* k*, and a* P*-heap if* φ(2k + 1) = 2k *for all* k*. The set* {k, φ(k), φ<sup>2</sup>(k), ...} *will be called the branch of* <sup>φ</sup> *associated to the integer* <sup>k</sup>*. Note that the predecessor function* π *defined as* π(i) = i − 1 *for all* i *is both an* O*-heap and a* P*-heap.*

**Definition 22.** *Suppose that* <sup>e</sup> : <sup>p</sup> <sup>→</sup> <sup>q</sup> *is a schedule, that* <sup>φ</sup> *is a* <sup>O</sup>*-heap over* q *and that* ψ *is a* P*-heap over* p*. The* O*-heap* (φ, e, ψ) *on* p + q *is defined as follows:*

$$(\phi, e, \psi)(k) \quad = \begin{cases} \begin{array}{ll} r(\phi(j)) & \text{if } k = r(j) \ is \ odd \\ l(\psi(i)) & \text{if } k = l(i) \ is \ odd \\ k - 1 & \text{otherwise} \end{array} \end{cases}$$

*where the schedule* e *is represented as a pair* (l, r) *as explained in Appendix A. Intuitively, the* O*-heap* (φ, e, ψ) *points alongside* φ *when the schedule* e *is at* 1 *and alongside* ψ *otherwise. The fact that* (φ, e, ψ) *defines an* O*-heap is ensured by the even case.*

We recall the partial order over the set of pointer functions introduced in [4].

**Definition 23 (Generalization).** *Given two pointer functions* φ*,* ψ*, we say that* φ *is a generalization of* ψ*, and note* φ ψ*, if the branch of* φ *associated to* k ∈ {1, .., n} *can be injected in the branch of* ψ *associated to* k*, or, in other words, if for all* k*, there exists* j *such that* φ(k) = ψ<sup>j</sup> (k)*.*

Further in the paper, and in certain proofs, we will also need to look into the structure of !!A. Intuitively, positions of !!A are pairs (φ, u) where u is a sequence of positions of !A and φ an O-heap. It is equivalent to another representation using only a sequence of positions of A:

**Proposition 12.** *A position* (φ, <sup>u</sup>) *of* !!<sup>A</sup> *is equivalent to* (φ, ψ, <sup>a</sup>) *with* <sup>φ</sup> <sup>ψ</sup>*,* ψ *an O-heap,* a *a sequence of positions of* A*, verifying*

$$\forall i, j \in \{1, \ldots, n\}, (i \neq j) \Rightarrow \exists k, a\_{\phi^k(i)} \neq a\_{\phi^k(j)}$$

*The moves alongside the branches of* ψ *are then plays of the simple game* A*.*

From this follows a description of the strategy

$$!!f \quad : \quad !!A \quad \longrightarrow \quad !!B$$

for a deterministic strategy f : A -B. The positions of !!f are of the form

$$\left(e, \left(\phi, \psi, \overline{a}\right), \left(\phi', \psi', \overline{b}\right)\right)$$

where e∗φ = φ, e∗ψ = ψ and each thread of (ψ, e, π) is a play of the strategy f.

### **D Some Bicategorical Definitions**

In this section, we recall a few definitions required by our bicategorical setting.

**Definition 24.** *A pseudofunctor is a mapping between bicategories* <sup>C</sup> *and* <sup>D</sup> *where the usual functorial equations* F(f ◦g) = F(f)◦F(g) *and* F(IdA) = Id<sup>F</sup> (A) *are only valid up to natural bijectve 2-morphisms in* D*.*

**Definition 25.** *Let* (C, <sup>⊗</sup>C, <sup>1</sup>C) *and* (D, <sup>⊗</sup>D, <sup>1</sup>D) *be two monoidal bicategories. A lax monoidal pseudofunctor between them is given by:*


*satisfying the following conditions:*

*– associativity: For every triple of objects* A, B, C ∈ C*, the following diagram commutes:*

$$\begin{array}{c|c} \left(F(A)\otimes\_{\mathcal{D}}F(B)\right)\otimes\_{\mathcal{D}}F(C) & \xrightarrow{\mathcal{D}}\_{a\_{F(A),F(B),F(C)}} F(A)\otimes\_{\mathcal{D}}\left(F(B)\otimes\_{\mathcal{D}}F(C)\right) \\ & \mu\_{A,B\otimes\text{id}} & \stackrel{id\otimes\mu\_{B,C}}{\longrightarrow} \\ F(A\otimes\_{\mathcal{C}}B)\otimes\_{\mathcal{D}}F(C) & & F(A)\otimes\_{\mathcal{D}}F(B\otimes\_{\mathcal{C}}C) \\ & \mu\_{A\otimes B,C} & & \mu\_{A,B\otimes C} \\ F((A\otimes\_{\mathcal{C}}B)\otimes\_{\mathcal{C}}C) & & \overline{F(a\_{A,B,C}^{\mathcal{C}})} \end{array}$$

*where the two morphisms* aC, a<sup>D</sup> *denote the associators of the two tensor products.*

*– unality: For every object* A ∈ C*, the following diagram and its right symmetry both commute:*

$$\begin{array}{c} \mathbf{1}\_{\mathcal{D}} \otimes\_{\mathcal{D}} F(A) \xrightarrow{\mathbf{1}\_{\mathcal{C} \otimes id}} F(\mathbf{1}\_{\mathcal{C}}) \otimes\_{\mathcal{D}} F(A) \\\\ \mathbf{1}\_{F(A)}^{\mathcal{D}} \left| \begin{array}{c} \\\\ \mu\_{\mathbf{1}\_{\mathcal{C}}, A} \\\\ F(\mathbf{1}) \end{array} \xrightarrow{F(\mathbf{1}\_{\mathcal{C}}^{\mathcal{C}}A)} F(\mathbf{1}\_{\mathcal{C}} \otimes\_{\mathcal{C}} A) \end{array}$$

*where* l <sup>C</sup>, l<sup>D</sup> *denote the left unitors of the two tensor products.* **Definition 26.** *Let* F, G *be two pseudofunctors between two bicategories* <sup>C</sup> *and* D*. A pseudonatural transformation* φ : F → G *is given by:*


*such that*

*–* φ *respects composition of morphisms, meaning that we have an equivalence between*

$$(\phi(A)\lhd G(f,g))\cdot(\phi(f)\rhd G(g))\cdot(F(f)\lhd\phi(g))$$

*and*

$$
\phi(g \circ f) \cdot (F(f, g) \rhd \phi(C)),
$$

*both being* 2*-morphisms from*

$$
\phi(C) \circ F(g) \circ F(f) \Rightarrow G(g \circ f) \circ \phi(A),
$$

*where* · *is the vertical composition between* 2*-morphisms,* , *the two versions of the horizontal composition between a morphism and a* 2*-morphism, (also called whiskering), anf* F(f,g) : F(g) ◦ F(f) ⇒ F(g ◦ f) *is the bijective* 2 *morphism coming from the pseudofunctor* F*.*

*–* φ *respects the identity morphisms, meaning we have an equivalence between*

$$L^{\mathcal{D}}\_{\phi(A)} \cdot \epsilon^F\_{id\_A} \rhd \phi(A)$$

*and*

$$R^{\mathcal{D}}\_{\phi(A)} \cdot \phi(A) \lhd \epsilon^G\_{id\_A} \cdot \phi(id\_A).$$

*both being* 2*-morphisms from*

$$
\phi(A) \diamond F(id\_A) \Rightarrow \phi(A),
$$

*where* L<sup>D</sup> <sup>φ</sup>(A) : φ(A) ◦ id<sup>F</sup> (A) ⇒ φ(A) *is the left unitor coming from the bicategory* <sup>D</sup> *and* <sup>F</sup> id<sup>A</sup> : F(idA) ⇒ id<sup>F</sup> (A) *is the bijective* 2*-morphism coming from the pseudofunctor* F*.*

*–* φ *is natural in the following sense: for every* 2*-morphism* ψ : f ⇒ g *with* f,g : A → B*, we have an equivalence between*

$$
\phi(g) \cdot F(\psi) \rhd \phi(B).
$$

*and*

$$
\phi(A) \lhd G(\psi) \cdot \phi(f) \,.
$$

**Definition 27.** *A fully weak comonad* <sup>G</sup> *on a bicategory* <sup>C</sup> *is a pseudofunctor, along with pseudonatural transformations* δ *and that satisfy the usual laws of a comonad up to natural bijectiive 2-morphisms in* C*.*

### **E Proof of Proposition 2**

*Proof.* Let A, B be two games.

Let σ be a P-cartesian transduction between A and B. The associated deterministic strategy f<sup>σ</sup> is simply given by:

$$f\_{\sigma}(2n) = \{ (c, a, \sigma(a)) | a \in A(n) \}$$

This definition clearly gives a functional strategy, the determinism being given by the fact that σ is P-cartesian.

Conversely, let f be a functional strategy of A - B. The associated Pcartesian transduction σ<sup>f</sup> is given by:

$$
\sigma\_f(2n)(a) = b \quad \text{s.t.} \quad (c, a, b) \in f(4n).
$$

Such a b is unique by functionality of f.

### **F Proof that P is a pseudofunctor**

*Proof.* First we need to complete the definition of P by detailling why, for f a deterministic strategy of A → B and σ a P-strategy over A, P(f)(σ) is indeed a P-strategy over B, and thus a presheaf over **tree**(B<sup>P</sup> ). For this, we need to define the collection of projector functions π2<sup>n</sup> : P(f)(σ)(2n) → P(f)(σ)(2n − 2) as follows:

For x ∈ P(f)(σ)(2n) over b (meaning x ∈ P(f)(σ)(b) and b ∈ B2<sup>n</sup>), there exists by definition a unique e, a such that (e, a, b) ∈ f and x ∈ σ(a). From this, we define:

$$
\pi\_{2n}(x) = \pi\_\sigma^k(x), (\pi^{2k+2}(e), \pi\_A^{2k}(a), \pi\_B^2(b)) \in f.
$$

By determinism of f, there is only one such k. Moreover, we also have πk <sup>σ</sup>(x) <sup>∈</sup> <sup>σ</sup>(π2<sup>k</sup> <sup>A</sup> (a)). Consequently, by definition of P(f)(σ), we have π<sup>k</sup> <sup>σ</sup>(x) ∈ P(f)(σ)(π<sup>2</sup> <sup>B</sup>(b)) as expected.

Next step is to show that, for a strategy f : A → B, P(f) is a functor from P(A) to P(B). For that, we need to define its effects on simulations. For α : σ → τ , P(f)(α) : P(f)(σ) → P(f)(τ ) is simply defined by applying α to all positions of P(f)(σ), as all those are induced from positions of σ by definition. With this, it is easy to verify that P(f) preserves identities and composition of simulations.

Finally, let us show that P is a pseudofunctor.

First, P(IdA)σ associates to a position a of A the set:

$$\begin{array}{rcl} \mathcal{P}(Id\_A)(\sigma) &:& a & \mapsto & \coprod & \sigma(a) . \end{array}$$

which is instantly isomorphic to σ(a). Factoring the effect on simulations, it is easy to build a bijective natural natural transformation between P(IdA) ∼= IdP(A). Thus P(IdA) ∼= IdP(A).

Next, let f : A → B and g : B → C two deterministic strategies and σ a P-strategy of A. We have:

$$\begin{array}{ccc} \mathcal{P}(g)(\mathcal{P}(f)(\sigma) &:\quad c &\mapsto& \coprod & \coprod & \coprod & \sigma(a). \\ & & (e\_2,b,c) \in g \ (e\_1,a,b) \in f \end{array}$$

This is easily isomorphic to P(g ◦ f)σ which is given by:

$$\begin{array}{rcl} \mathcal{P}(g \circ f)(\sigma) &:& c & \mapsto & \coprod & & \sigma(a) . \\ & & & (e, a, c) \in g \circ f \end{array}$$

This isomorphism is a consequence of the definition of composition for deterministic strategies, as there is only one triple e1, e2, b such that (e1, a, b) ∈ f, (e2, b, c) ∈ g and e = e<sup>1</sup> · e<sup>2</sup> for a position (e, a, c) ∈ g ◦ f.

This extends into a natural isomorphism between the functors P(g ◦ f) and P(g)(P(f), giving us the fact that P is indeed a pseudofunctor.

### **G Proof of Proposition 6**


### **H Proof of Theorem 3**

*Proof.* First, we can note that the unit 1 of G has a unique P-strategy, the empty strategy. Consequently, P(1) is the singleton category, which is the unit of the cartesian product in **Cat**.

Moreover, to extend P as a lax monoidal pseudofunctor, we need a transformation μA,B : P(A) × P(B) → P(A ⊗ B) natural in A and B.

Since the morphisms of that transformation live in **Cat**, they are functors. We thus define:

for σ an object of P(A) and τ an object of P(B),

$$
\mu\_{A,B}(\sigma,\tau) = \sigma \otimes \tau
$$

for α : σ → σ a morphism of P(A) and β : τ → τ a morphism of P(B), μA,B(α, β) : σ ⊗ τ → σ ⊗ τ is defined by:

$$\mu\_{A,B}(\alpha,\beta)(t,x,y) = (t,\alpha(x),\beta(y))$$

We now need to prove that this transformation is natural in A and B, and that it verifies the two commutative diagrams of a lax monoidal functor (associativity and unitality), up to bijective simulations. Those last two are easy to verify and use similar arguments, so we will focus on the naturality.

We need our transformation to verify the following commutative diagram for A, B, A , B four games and f : A - A , g : B - B two deterministic strategies:

$$\mathcal{P}(A)\times\mathcal{P}(B)\xrightarrow{\begin{pmatrix}\mathcal{P}(A)\times\mathcal{P}(B)\end{pmatrix}}\mathcal{P}(A\otimes B)$$

$$\mathcal{P}(f)\times\mathcal{P}(g)$$

$$\mathcal{P}(A')\times\mathcal{P}(B')\xrightarrow{\begin{pmatrix}\mathcal{P}(f\otimes g)\\\hline\mu\_{A',B'}\end{pmatrix}}\mathcal{P}(A'\otimes B')$$

Let σ be a P-strategy of A and τ a P-strategy of B. Verifying the commutative diagram amounts to finding two reciprocal morphisms between: P(f)(σ) ⊗ P(g)(τ ) and P(f ⊗ g)(σ ⊗ τ ).

<sup>P</sup>(f)(σ) <sup>⊗</sup> <sup>P</sup>(g)(<sup>τ</sup> ) <sup>∼</sup><sup>=</sup> image(<sup>f</sup> ◦ supp <sup>σ</sup>) <sup>⊗</sup> image(<sup>g</sup> ◦ supp <sup>τ</sup> ) <sup>P</sup>(f)(σ) <sup>⊗</sup> <sup>P</sup>(g)(<sup>τ</sup> ) <sup>∼</sup><sup>=</sup> image(<sup>f</sup> ◦ supp <sup>σ</sup> <sup>⊗</sup> <sup>g</sup> ◦ supp <sup>τ</sup> ) by consequences of prop 6

<sup>P</sup>(<sup>f</sup> <sup>⊗</sup> <sup>g</sup>)(<sup>σ</sup> <sup>⊗</sup> <sup>τ</sup> ) <sup>∼</sup><sup>=</sup> image((<sup>f</sup> <sup>⊗</sup> <sup>g</sup>) ◦ supp <sup>σ</sup>⊗<sup>τ</sup> ) <sup>P</sup>(<sup>f</sup> <sup>⊗</sup> <sup>g</sup>)(<sup>σ</sup> <sup>⊗</sup> <sup>τ</sup> ) <sup>∼</sup><sup>=</sup> image((<sup>f</sup> <sup>⊗</sup> <sup>g</sup>) ◦ supp <sup>σ</sup> <sup>⊗</sup> supp <sup>τ</sup> ) by consequences of prop 6

By bifunctoriality of <sup>⊗</sup>, we have <sup>f</sup> ◦ supp <sup>σ</sup> <sup>⊗</sup> <sup>g</sup> ◦ supp <sup>τ</sup> <sup>∼</sup><sup>=</sup> (<sup>f</sup> <sup>⊗</sup> <sup>g</sup>) ◦ supp <sup>σ</sup> <sup>⊗</sup> supp <sup>τ</sup> , giving us the equality of the images we need, up to bijective simulations.

### **I Proof of Proposition 9**

*Proof.* – Let (ψ, b = b1, ...bn) a P position of !B. Since f is slender, for all b<sup>i</sup> player positions of b, there exists a unique pair (ei, ai) such that (ei, ai, bi) ∈ f.

We use a method similar to the one used in the proof of Proposition 10. Instead of using the tensorial schedule to guide us in reconstructing the play of !A -!B, we use ψ, which indicates us what is the next player move b<sup>i</sup> to get to (starting from b<sup>i</sup>−<sup>2</sup>, and assuming we have reconstructed e and φ so far), and then use the play (ei, ai, bi) to construct the play.

The sequence of moves we add is the suffix of the play (ei, ai, bi) looking like <sup>b</sup>i−<sup>1</sup>a<sup>1</sup> <sup>i</sup> ....a<sup>k</sup> <sup>i</sup> b<sup>i</sup> (with a<sup>k</sup> <sup>i</sup> = ai) as any other move in the play (ei, ai, bi) has already been played (since in particular any b move prior to bi−<sup>1</sup> has been played.

Player cannot backtrack in the middle of the sequence <sup>b</sup>i−<sup>1</sup>a<sup>1</sup> <sup>i</sup> ....a<sup>k</sup> <sup>i</sup> b<sup>i</sup> without breaking the fact that the full play is associated to a O-heap in !(A -B).

This allows us to extend e into e.1.0k.1 and φ by linking a<sup>1</sup> <sup>i</sup> to its predecessor in A of the play (ei, ai, bi).

This method constructs a valid position of !f as all branches are played following f and φ is a O-Heap. It is the only possible position including ψ, b as everything we have done was determined by ψ, f and b. Thus !f is a slender strategy.

– Let (φ, a = a1, ...an) an O position of !A. Since f is a functional strategy, for all a<sup>i</sup> opponent positions of a, there exists a unique b<sup>i</sup> such that (c, ai, bi) ∈ f. By determinism of f, it is also true for all player positions of a. By using φ as a guide, this easily allows us to construct the position of !f: (c,(φ, a),(φ, b = b1, ...bn)).

It is the unique such position for (φ, a) for reasons similar to the ones evoked in the proof for slender strategies. Thus !f is a functional strategy.

### **J Proof of Theorem 4**

*Proof.* – For a game A, we have by construction:

$$\begin{aligned} (!\_{\mathcal{P}})\_{A,B}(Id\_A) &= \mathcal{P}(n\_{A,B}) \circ \#^{\mathcal{S}}(Id\_A) \\ (!\_{\mathcal{P}})\_{A,B}(Id\_A) &= \mathcal{P}(n\_{A,B})(\#Id\_A) = Id\_{!A} \end{aligned}$$

– Let A, B, C be three games and σ a P-strategy of A - B, τ a P-strategy of B - C. We need to prove that there is a natural isomorphic simulation between !P(τ ◦ σ) and !P(τ ) ◦ !P(σ).

First we will simplify those two strategies through the various properties we have seen so far:

First !P(τ ◦ σ):

```
!P(τ ◦ σ) = P(nA,C )(#S(τ ◦ σ))
                        !P(τ ◦ σ) ∼= image(nA,C ◦ supp #S (τ◦σ)) by equation 4
                    !P(τ ◦ σ) ∼= image(nA,C ◦ !supp τ◦σ ) by consequence of def 13
                 !P(τ ◦ σ) ∼= image(nA,C ◦ !supp P(compA,B,C )(σ⊗τ)) by definition 9
              !P(τ ◦ σ) ∼= image(nA,C ◦ !supp image(compA,B,C ◦supp σ⊗τ )) by equation 4
        !P(τ ◦ σ) ∼= image(nA,C ◦ supp image(!(compA,B,C ◦supp σ⊗τ ))) by consequence of def 13
                 !P(τ ◦ σ) ∼= image(nA,C ◦ !(compA,B,C ◦ supp σ⊗τ )) by theorem 1
!P(τ ◦ σ) ∼= image(nA,C ◦ !compA,B,C ◦ supp #S (σ⊗τ)) by functoriality of ! and consequence of def 13
```
Then, !P(τ ) ◦ !P(σ):

!P(τ) ◦ !P(σ) = <sup>P</sup>(nB,C )(#<sup>S</sup>τ) ◦ <sup>P</sup>(nA,B)(#<sup>S</sup>σ)

```
!P(τ) ◦ !P(σ) ∼= image(nB,C ◦ supp #S (τ)) ◦ image(nA,B ◦ supp #S (σ)) by equation 4
!P(τ) ◦ !P(σ) ∼= P(comp!A,!B,!C)(image(nA,B ◦ supp #S (σ)) ⊗ image(nB,C ◦ supp #S (τ))) by definition 9
!P(τ) ◦ !P(σ) ∼= P(comp!A,!B,!C)(image(nA,B ◦ supp #S (σ)⊗ nB,C ◦supp #S (τ))) by consequence of prop 6
!P(τ) ◦ !P(σ) ∼= image(comp!A,!B,!C ◦ (supp image(nA,B◦ supp #S (σ)⊗ nB,C ◦ supp #S (τ)))) by equation 4
!P(τ) ◦ !P(σ) ∼= image(comp!A,!B,!C ◦ (nA,B ◦ supp #S (σ) ⊗ nB,C ◦ supp #S (τ))) by theorem 1
!P(τ) ◦ !P(σ) ∼= image(comp!A,!B,!C ◦ nA,B ⊗ nB,C ◦ supp #S (σ) ⊗ supp #S (τ)) by bifunctoriality of ⊗
!P(τ) ◦ !P(σ) ∼= image(comp!A,!B,!C ◦ nA,B ⊗ nB,C ◦!supp (σ)⊗!supp (τ)) by consequence of def 13
```
We intend to prove that those two images are isomorphic. For that, we will make the following remark:

! is lax monoidal in G, meaning that there exists a transformation μA,B : !A⊗!B →!(A ⊗ B) natural in A and B. Thus we have the following diagram with the top square commuting by naturality of μ:

In more details, positions of μA,B are of the form: (e,(t, φ, a, ψ, b),(Φ,t , a, b)), where, for a position (Φ,t , a, b) of !(A⊗B), one can rebuild the unique associated position by playing the moves in order and building the tensorial schedule and the O-heaps incrementally, the general structure ensuring that we do get them in the end. Consequently μA,B is slender and induces a transduction from B to A.

Note that it is not bijective as the play of !(A ⊗ B) where we play in B, then backtrack to play in A would produce the same play in !A⊗!B than playing in B then in A without backtracking.

Thus, we have, since μ{<sup>σ</sup> <sup>|</sup> <sup>A</sup>-B},{τ | B-<sup>C</sup>} is slender:

$$\begin{aligned} \mathsf{image}(n\_{A,C} \circ \ !comp\_{A,B,C} \circ \ \mathsf{supp}\_{\#^{\S}(\sigma \otimes \tau)}) & \cong \mathsf{image}(n\_{A,C} \circ \ !comp\_{A,B,C} \circ \ !comp\_{A,B,C} \circ \ !comp\_{A,C} \ ?\ \mathsf{supp}\_{\#^{\S}(\sigma \otimes \tau)} \ ?\ \mathsf{image}(n\_{A,C} \circ \ !comp\_{A,B,C} \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}(n\_{A,C} \circ \ ?\ \mathsf{image}$$

Then, by naturality,

image(nA,C ◦ !compA,B,C ◦ supp #S(σ⊗τ)) <sup>∼</sup><sup>=</sup> image(nA,C ◦ !compA,B,C ◦ μ<sup>A</sup>-B,B-<sup>C</sup> ◦ supp #Sσ⊗#S<sup>τ</sup> )

Consequently,

$$\begin{aligned} \mathsf{image}(n\_{A,C} \circ \ !comp\_{A,B,C} \circ \ \mathsf{supp}\_{!(\sigma \otimes \tau)}) & \cong \\ \mathsf{image}(com\_{!A,!B,!C} \circ n\_{A,B} \otimes \ n\_{B,C} \circ !\mathsf{supp}\_{(\sigma)} \otimes !\mathsf{supp}\_{(\tau)}) \end{aligned}$$

if and only if

$$\mathsf{im\texttt{sge}}(n\_{A,C}\circ \mathsf{l}\,\mathsf{comp}\_{A,B,C}\circ \,\mu\_{A\multimap B,B\multimap C}\circ \mathsf{suspp}\_{!\sigma\otimes\!\!\!\!\top\sigma})\cong$$
 
$$\mathsf{im\texttt{sge}}(\mathsf{comp}\_{A,!B,!C}\circ n\_{A,B}\otimes \,\,n\_{B,C}\circ \mathsf{l}\,\mathsf{supp}\_{\left(\sigma\right)}\otimes \mathsf{l}\,\mathsf{supp}\_{\left(\tau\right)})$$

meaning if and only if

$$\begin{aligned} \mathsf{image}(n\_{A,C} \circ \ !comp\_{A,B,C} \circ \ \mu\_{A \multimap B, B \multimap C}) & \cong \\ \mathsf{image}(comp\_{A,!B,!C} \circ n\_{A,B} \otimes \ n\_{B,C}) \end{aligned}$$

An important remark is that μ<sup>A</sup>-B,B-<sup>C</sup> transfers plays p of (!(A - B)⊗!(B - <sup>C</sup>)) such that there exists (e,(φ, <sup>a</sup>),(ψ, <sup>c</sup>))<sup>p</sup> <sup>∈</sup> image(comp!A,!B,!<sup>C</sup> ◦ nA,B ⊗ nB,C ) to plays p of !(A - B ⊗ B - C) such that there exists (e,(φ, a),(ψ, c)) <sup>p</sup> <sup>∈</sup> image(nA,C ◦ !compA,B,C ).

In other words μ, when restricted to plays that play a role in the images we outlined, acts as a function from the set of plays of (!(A - B)⊗!(B - C)) to the set of plays of !(A - B ⊗ B - C). This can be proved by looking at the respective structures of the plays and induces one half of the isomorphism we need.

We do a similar study by introducing a P-strategy of !(A - B ⊗ B - C) - (!(A - B)⊗!(B - C)) that acts as a converse of μ<sup>A</sup>-B,B-<sup>C</sup> for such plays and thus get a converse to our morphism, which will give us the second half of the isomorphism we need. Here is how we proceed:

Let (t,(φ, e, a, b),(ψ, f, b, c)) be a play of (!(A - B)⊗!(B - C)) such that there exists

$$(e\_{!A-\circlearrowleft C}, (\phi\_{!A}, \overline{a}), (\phi\_{!C}, \overline{c}))\_{e, (t, \phi, e, a, b, \psi, \overline{f, b, c})} \in \mathsf{image}(comp\_{!A, !B, !C} \circ n\_{A, B} \otimes n\_{B, C}).$$

In particular, that implies that, since nA,B ⊗ nB,C doesn't change the order of moves, the sequence of moves of (t,(φ, e, a, b),(ψ, f, b, c)) must be able to be the left projection of comp!A,!B,!<sup>C</sup> . This restricts the way the moves can be played.

In particular, B moves from the two components must must answer each other right away, giving sequences without backtrack of the form c(br.bl.bl.br) ∗ c, with similar structures for sequences starting and/or finishing with a A move. In addition, there cannot be any backtrack in A or any of the two B component that would not be initiated by a backtrack in a C component.

The idea is that a backtrack in C induces a backtrack in B which is mirrored on the left component and induces a backtrack in A. Those backtracks give us a heap structure and the moves inside a sequence follow a proper tensor schedule, so it can be seen as a play of !(A - B ⊗ B - C) and it is easy to verify that this play would produce an element of image(nA,C ◦ !compA,B,C ◦ <sup>μ</sup><sup>A</sup>-B,B-C ) and that the P-strategy of !(A - B ⊗ B - C) - (!(A - B)⊗!(B - C)) built by reorganizing structure without changing order of moves is a converse to μ<sup>A</sup>-B,B-C .

Consequently, we have the bijection of images we needed and thus an isomorphic simulation between !P(τ ◦ σ) and !P(τ ) ◦ !P(σ). It is natural since μ and the isomorphisms involved in the manipulation of images are natural.

The few additional diagrams that must be checked are easy to verify with similar methods, and thus we have that !<sup>P</sup> is a pseudofunctor.

### **K Proof that ! Is a Pseudocomonad**

In the following section, we'll detail the construction of the pseudonatural transformations δ and and prove their naturality. From those definitions, verifying that ! is a pseudocomonad is easy as the morphism part of the two natural transformations coincides with their definition in the deterministic case, making the diagrams commute instantly. After that, we may do a similar study on d, e to give ! the necessary structure to be a linear exponential modality.

We will handle here the case of δ<sup>σ</sup> for a P-strategy σ : A → B. This is, by Definiton 26, a bijective 2-morphism between !P!Pσ◦δ<sup>A</sup> and δ<sup>B</sup> ◦ !Pσ, both being P-strategies of !A -!!B.

First note that

$$\mathbb{M}\_{\mathcal{P}} !\_{\mathcal{P}} \sigma \circ \delta\_A = \mathsf{im} \mathsf{age}(comp\_{!A, !!A, !!B} \circ \mathsf{supp}\_{!\_{\mathcal{P}}!\_{\mathcal{P}} \sigma} \otimes \mathsf{supp}\_{\delta\_A})$$

and that

$$
\delta\_B \circ \ !\_{\mathcal{P}} \sigma = \mathsf{image}(comp\_{!A, !B, !!B} \circ \ \mathsf{supp}\_{\delta\_B} \otimes \mathsf{supp}\_{!\mathcal{P}} ) .
$$

We want to study the structure of both images to find an isomorphic simulation between them.

What we will do is start from a position

$$e, (\phi\_A, \overline{a}), (\psi\_B, \phi\_B, \overline{b})$$

of !A -!!B and go back along the arrows to see what structure the positions that produce this position must have.

First, on the left branch, the presence of comp!A,!B,!!<sup>B</sup> indicates that the position in !A -!B⊗!B -!!B must be of the form

$$(t, (e\_1, (\phi\_A, \overline{a}), (\Phi\_B, \overline{b'}), (e\_2, (\Phi\_B, \overline{b'}), (\psi\_B, \phi\_B, \overline{b}))))$$

for some t, e1, e2, ΦB, b such that e<sup>1</sup> · e<sup>2</sup> = e.

Since the right component of this position comes from δB, we actually have b = b, Φ<sup>B</sup> = φB, e<sup>2</sup> = c and thus e<sup>1</sup> = e and we actually have the position

$$(t, (e, (\phi\_A, \overline{a}), (\phi\_B, \overline{b}), (c, (\phi\_B, \overline{b}), (\psi\_B, \phi\_B, \overline{b})))$$

for some t which is fixed by the two components for the composition to work.

And thus, this gives us the following position of R!P<sup>σ</sup> ⊗ R<sup>δ</sup><sup>B</sup> :

$$(t, ((\phi\_A, e, \pi), \overline{x}), (c, (\phi\_B, \overline{b}), (\psi\_B, \phi\_B, \overline{b})))$$

where x is a sequence of moves that gets projected to the sequence of moves of (e,(φA, a),(φB, b)). There is no modification of the order the moves are played in this step, just a reorganization of the structure.

Thus a position of R<sup>δ</sup>B◦ !P<sup>σ</sup> is of the form

$$((e, (\phi\_A, \overline{a}), (\psi\_B, \phi\_B, b))\_{(t, ((\phi\_A, e, \pi), \overline{x}), (c, (\phi\_B, \overline{b}), (\psi\_B, \phi\_B, \overline{b})))})^\*$$

We apply a similar reasoning to the right branch to obtain the form of a position of R!P!Pσ◦δ<sup>A</sup> :

$$((e, (\phi\_A, \overline{a}), (\psi\_B, \phi\_B, b))\_{\langle t', (c, (\phi\_A, \overline{a}), (e\*\psi\_B, \phi\_A, \overline{a})), ((e\*\psi\_B, e, \pi), (\phi\_A, e, \pi), \overline{x'}\rangle))$$

where t is fixed by the composition and the sequence of moves x gets projected to the same sequence of moves than x in the left branch. In particular, both sequences have the same length.

Since everything is fixed from the initial position (e,(φA, a),(ψB, φB, b)) but the two sequences x and x , we can then build δ<sup>σ</sup> as the simulation sending one position to the other one sharing that same initial structure and the same sequence x.

With a simlar study, we build <sup>σ</sup> as the simulation that sends positions of the form

$$(e, (\pi, \overline{a}), b)\_{t, (c, (\pi, \overline{a}), a), x})$$

to positions of the form

$$(e, (\pi, \overline{a}), b)\_{t', (\pi, \overline{x}), (c, (\pi, \overline{b}), b)})\dots$$

where t, t are fixed by construction and x is the branch of positions finishing in x in Rσ.

*Proof.* We will now prove the pseudonaturality of , δ is handled in a similar way. Let us look at the naturality first. Let A, B be two games, σ, τ two P-strategies of A - B and α : σ → τ a simulation We require that the two following pasting diagrams are equivalent:

This amounts to the following equality of simulations:

$$(\epsilon\_A \lhd \alpha) \cdot \epsilon\_\sigma^{-1} = \epsilon\_\tau^{-1} \cdot (!\_{\mathcal{P}} \alpha \rhd \epsilon\_B),$$

where , indicate the whiskering that results from the composition of Pstrategies and · indicates the vertical composition which is simply the composition of functions. Thus, for a position

$$\left(e, (\pi, \overline{\pi}), b\right)\_{t', (\pi, \overline{\pi}), (c, (\pi, \overline{b}), b)}$$

of <sup>B</sup> ◦ !Pσ, we have:

$$\begin{aligned} \left(\epsilon\_A \lhd \alpha\right) \cdot \epsilon\_\sigma^{-1} \left( \langle e, (\pi, \pi), b \rangle\_{t', \{\pi, \overline{\pi}\}, (c, \{\pi, \overline{\pi}\}, b)} \right) &= \left( \epsilon\_A \lhd \alpha \right) \left( \langle e, (\pi, \overline{\pi}), b \rangle\_{t, \{c, \{\pi, \overline{\pi}\}, a\}, x} \right) \text{ by def of } \epsilon\_\sigma \\\\ \left( \epsilon\_A \lhd \alpha \right) \cdot \epsilon\_\sigma^{-1} \left( \langle e, (\pi, \overline{\pi}), b \rangle\_{t', \{\pi, \overline{\pi}\}, (c, \{\pi, \overline{\pi}\}, b)} \right) &= \left( e, \langle \pi, \overline{\pi} \rangle, b \right)\_{t, \{c, \{\pi, \overline{\pi}\}, a\}, a} \text{ by def of } \mathcal{P}, \epsilon\_A \end{aligned}$$

On the other hand,

$$\begin{split} \epsilon\_{\tau}^{-1} \cdot \left( !\_{\mathcal{P}} \alpha \rhd \epsilon\_{B} \right) & \left( \left( e, \left( \pi, \overline{\pi} \right), b \right)\_{t', \left( \pi, \overline{\pi} \right), \left( c, \left( \pi, \overline{b} \right), b \right)} \right) \\ &= \epsilon\_{\tau}^{-1} \cdot \left( \left( e, \left( \pi, \overline{a} \right), b \right)\_{t', \left( \pi, \overline{\alpha(x)} \right), \left( c, \left( \pi, \overline{b} \right), b \right)} \right) \text{ by def of } \ \mathcal{P}, \epsilon\_{B}, !\_{\mathcal{P}} \\ & \\ & \epsilon\_{\tau}^{-1} \cdot \left( !\_{\mathcal{P}} \alpha \rhd \epsilon\_{B} \right) \left( \left( e, \left( \pi, \overline{a} \right), b \right)\_{t', \left( \pi, \overline{\pi} \right), \left( c, \left( \pi, \overline{b} \right), b \right)} \right) \\ & = \left( e, \left( \pi, \overline{a} \right), b \right)\_{t, \left( c, \left( \pi, \overline{\pi} \right), a \right), \alpha(x)} \right) \text{ by def of } \ \epsilon\_{\tau} \end{split}$$

And thus, we have the equivalence we require. The other diagram equalitiies we need to verify are done in a similar way.

The key point to remember from this proof and the similar ones that need to be done, is that, while the form of the positions is a bit heavy, the structures that underly them do most of the work for us, making most of the needed verifications very easy, once the positions have been properly described.

We apply those methods to verify that ! is indeed a pseudocomonad, to define and verify that dA, e<sup>A</sup> are proper pseudonatural transformations and to check that !, along with those transformations, does have the structure of a linear exponential modality.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Syntactic View of Computational Adequacy**

Marco Devesas Campos(B) and Paul Blain Levy

School of Computer Science, University of Birmingham, Birmignham, UK *{*m.devesascampos,pbl*}*@cs.bham.ac.uk

**Abstract.** When presenting a denotational semantics of a language with recursion, it is necessary to show that the semantics is computationally adequate, i.e. that every divergent term denotes the "bottom" element of a domain.

We explain how to view such a theorem as a purely syntactic result. Any theory (congruence) that includes basic laws and is closed under an infinitary rule that we call "rational continuity" has the property that every divergent term is equated with the divergent constant. Therefore, to prove a model adequate, it suffices to show that it validates the basic laws and the rational continuity rule. While this approach was inspired by the categorical, ordered framework of Abramsky et al., neither category theory nor order is needed.

The purpose of the paper is to present this syntactic result for call-bypush-value extended with term-level recursion and polymorphic types. Our account begins with PCF, then includes sum types, then moves to call-by-push-value, and finally includes polymorphic types.

### **1 Introduction**

*Models of Recursion.* A conventional denotational account of a language with recursion proceeds as follows. First define the syntax and operational semantics. Then give a denotational model. Lastly, prove *soundness*, i.e. if t evaluates to u (written t ⇓ u) then t <sup>=</sup> u, and *adequacy*, i.e. if t diverges (written t ⇑) then t <sup>=</sup> <sup>⊥</sup>.

Because it is often convenient to structure a model categorically, Fiore and Plotkin (1994) gave categorical axioms on a model that imply (soundness and) adequacy. Crucially, in their work, as detailed by Fiore (1996), a model is required to be "ω**Cpo**-enriched", meaning that a term denotes an element of a pointed ω-cpo (poset with least element <sup>⊥</sup> and suprema of all increasing ω-chains), and a term constructor is ω-continuous (preserves suprema of ω-chains). Thus (for a call-by-name language) a term x : A t : A gives a continuous endofunction f, and the recursion **rec** x.M denotes the supremum of (f <sup>n</sup>⊥)<sup>n</sup>∈<sup>N</sup>, the least (pre)fixpoint of f.

P. B. Levy—Research Supported by UK EPSRC Grant EP/N023757/1.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 71–87, 2018. https://doi.org/10.1007/978-3-319-89366-2\_4

However, for the models of Abramsky et al. (2000), Abramsky and McCusker (1997), and McCusker (1998), the requirement of ω**Cpo**-enrichment is too restrictive, because the posets arising do not have suprema of *all* increasing ω-chains (Normann 2006). So these papers use a more relaxed ordered framework where the only suprema that must be preserved are those of chains (<sup>f</sup> <sup>n</sup>⊥)n∈<sup>N</sup> of iterated applications. This means that any so called *rational chain* (<sup>g</sup> ◦ <sup>f</sup> <sup>n</sup>⊥)n∈<sup>N</sup> has an upper bound given by g ( - f <sup>n</sup>⊥)—a property known as *rational continuity* (Wright et al. 1976; cf. also Bloom and Esik ´ 1993).

*Recursion but Rationally.* Our goal is to give an even more relaxed version of this "rational" framework for adequacy; one that uses no category theory, order or denotational model. It could be viewed as a purely syntactic result: a property of a *theory* (congruence) <sup>≈</sup> rather than of a model. Thus we want t ⇓ u to imply t <sup>≈</sup> u, and t ⇑ to imply t <sup>≈</sup> Ω, where Ω is a divergent constant. The benefit of such a result is to modularize the narrative described at the start; we can get adequacy out of the way before we start studying categorical and denotational semantics.

*Rational Continuity.* Currently we have accomplished this goal for term-level recursion and polymorphic types. (Recursive and existential types are left to future work; see Sect. 6). Our result is that any theory (congruence) ≈ will be sound and adequate provided it (a) contains the β-laws, fixpoint law and strictness laws and (b) is closed under an infinitary rule called *rational continuity*. This rule says (for a call-by-name language) that if C[**rec**<sup>n</sup> x.t] <sup>≈</sup> <sup>D</sup>[**rec**<sup>n</sup> x.t] for infinitely many n <sup>∈</sup> <sup>N</sup>, then C[**rec** x.t] <sup>≈</sup> D[**rec** x.t]. Here we write **rec**<sup>n</sup> x.t for the nth *approximant to recursion*, defined by the clauses **rec**<sup>0</sup> x.t := <sup>Ω</sup> and **rec**n+1 x.t := t[**rec**<sup>n</sup> x . t/x].

*Plan.* To include both call-by-value (CBV) and call-by-name (CBN), we have established our result for call-by-push-value. The latter has both value types and computation types, but the treatment of value types in our proof is more complicated, so we begin in the CBN setting, which has only computation types. Our CBN account itself begins with PCF, which has only base types and function types; we then include sum types, using a proof method adapted from McCusker (1998). Next we move to call-by-push-value, and use *ultimate pattern matching* of values (Lassen and Levy 2008) to treat the value types. Finally we include polymorphic types.

*Related Work.* Adequacy of topos models has been studied using an internal language (Simpson 2004). Other adequacy results for polymorphic models include realizability semantics (Møgelberg 2009) and game semantics (Laird 2013).

### **2 PCF**

*Language.* We begin by introducing a version of Plotkin's PCF (1997) that replaces fixpoint combinators with recursion operators and an explicit divergence construct Ω (Table 1). As per usual, terms are taken up to α-equivalence. The set


#### **Table 1.** PCF

of closed terms of type T will be denoted by CTerms<sup>T</sup> and that of normal forms by NF<sup>T</sup> . For a closed term t there is at most one v such that t ⇓ v; when there is none we say it diverges and represent this by t ⇑.

#### **2.1 A Rationally Continuous Theory of PCF**

*The Theory.* A *congruence on terms* is a type-indexed equivalence relation on closed terms of said type satisfying t <sup>≈</sup> t <sup>=</sup><sup>⇒</sup> C[t] <sup>≈</sup> C[t ] for any context C[−] where the hole is closed. (We omit type annotations.) A congruence is a *rationally continuous* β-Ω*-fix theory* if it also satisfies the rules in Table 2.

The basis for the theory are the obvious β rules that mimic the reduction rules. In a similar vein, the fixpoint rule establishes that each recursive term is the fixpoint of a substitution. These rules alone are enough to establish the soundness of the theory with respect to reduction.

**Proposition 1 (Soundness).** *Any congruence* <sup>≈</sup> *satisfying the* β *and fixpoint rules (Table 2) is sound:* t ⇓ r <sup>=</sup><sup>⇒</sup> t <sup>≈</sup> r*.*

*A Converse.* Our sights now turn to proving that divergent terms are identical to Ω. The extra requirement calls for a more refined theory that can more closely mirror the behaviour of reduction. The last two sets of equations in Table 2 fill the gaps in what the reduction rules *don't* say about divergence. The first **Table 2.** Rationally continuous β-Ω-fix theory of PCF

relates to the strictness of the operators: divergence of an argument leads to the divergence of the operator, e.g., Ωu <sup>≈</sup> Ω. The second is the rational continuity rule presented in the introduction.

*Rational Continuity and Chains.* To prove adequacy, one often has to re-write or equate certain terms built with recursion either with some constant or as the unrolling of the recursive term a few times. In cpo models, continuity and compositionality of the interpretations validate the following rule

$$\frac{\forall n \in \mathbb{N}. \left[C[\mathbf{rec}^n \, x \,. t]\right] = \left[D[\mathbf{rec}^n \, x \,. t]\right]}{\left[C[\mathbf{rec} \, x \,. t]\right] = \left[D[\mathbf{rec} \, x \,. t]\right]}$$

But this can be further weakened by requiring only equality *at infinitely many* n, for then one would still be able to define chains with exactly the same least upper bounds. We write <sup>∃</sup><sup>∞</sup>n.P(n) to mean *there exist infinitely many* n *in* <sup>N</sup> *for which* P(n) *holds*. This leads us to the syntactic continuity rule in Table 2. Since adequacy refers solely to closed terms, we only require this property for x : T t : T—and therefore **rec**<sup>n</sup> x.t and **rec** x.t are closed. Similarly, by <sup>a</sup> *rational chain* we mean a chain of the form C[**rec** <sup>n</sup>x.t] for infinitely many n <sup>∈</sup> <sup>N</sup>, and by its limit we mean the term C[**rec** x.t].

#### **2.2 Adequacy**

*The Claim.* We now embark on the syntactic journey towards a proof we have an adequate theory—formally, that t ⇑ <sup>=</sup><sup>⇒</sup> t <sup>≈</sup> Ω. By the aforementioned reasons the proof follows the usual approaches by replacing closure under bottom elements and least upper bounds of the relevant chains with closure under divergence and limits of rational chains.

*Approximations.* First we define abstractly<sup>1</sup> the notion of an approximation candidate between terms and the values they approximate; these are then extended to relations on terms. The concrete relations we use for each type are given by certain actions on approximation candidates (cf., e.g., Pitts 2000). When using the result of an action <sup>φ</sup> on approximation candidates 1,...,<sup>n</sup> infix, we will sometimes surround the result with brackets, as in t φ(1,...,<sup>n</sup>) u, to aid readability.

**Definition 1 (Approximation Candidates).** *An approximation candidate for a type* <sup>T</sup> *is a subset of CTerms*<sup>T</sup> <sup>×</sup> *NFs*<sup>T</sup> *s.t.:*


$$C(\exists^{\infty}n. C[\mathbf{rec}^n x. t] \lhd v) \implies C[\mathbf{rec} \, x. t] \lhd v$$

**Proposition 2.** *If is an approximation candidate for type* T*, then the binary relation on CTerms*<sup>T</sup> *defined by*

t <sup>c</sup> u ⇐⇒ t <sup>≈</sup> Ω *or* (∃v.u ⇓ v *and* tv)

*satisfies the following properties:*


$$(\exists^{\infty} n. C[\mathbf{rec}^n x \ ., t] \lhd^c u) \implies C[\mathbf{rec} \, x \ ., t] \lhd^c u)$$

*Proof. To give a taste of how the proofs go using rational admissibility, assume we have* <sup>∃</sup><sup>∞</sup>n.C[**rec**<sup>n</sup> x.t] <sup>c</sup> u. *From the definition, one of two options (possibly both) is true: that an infinite number of terms on the left are identical to* Ω*; or that for an infinite series of* m, C[**rec**<sup>m</sup> x.t] *is related to the value* <sup>v</sup> *that* <sup>u</sup> *reduces to (determinism of reduction is paramount here). Admissibility then follows by rational continuity in the first case (using the obvious constant context), and by admissibility of (Definition* <sup>1</sup>*) in the second.*

<sup>1</sup> Anticipating our treatment of polymorphism in Sect. 4, we have purposefully set up here a proof structure in the style of Girard (1989).

**Proposition 3 (Base Type Actions).** *The two binary relations Bool* <sup>⊆</sup> *CTermsBool* <sup>×</sup> *NFsBool and Nat* <sup>⊆</sup> *CTermsNat* <sup>×</sup> *NFsNat defined by*

t *Bool* <sup>v</sup> ⇐⇒ <sup>t</sup> <sup>≈</sup> <sup>v</sup> *and* t *Nat* <sup>v</sup> ⇐⇒ <sup>t</sup> <sup>≈</sup> <sup>v</sup>

*are approximation candidates for Bool and Nat.*

**Proposition 4 (Arrow Action).** *Given approximation candidates* <sup>T</sup> *for* <sup>T</sup> *and* <sup>U</sup> *for* <sup>U</sup>*, the binary relation between CTerms*T→<sup>U</sup> *and NFs*T→<sup>U</sup>

> <sup>t</sup> <sup>T</sup> <sup>→</sup> <sup>U</sup> λx.u ⇐⇒ ∀p <sup>c</sup> <sup>T</sup> q . tp <sup>c</sup> <sup>U</sup> <sup>u</sup>[q/x])

*is an approximation candidate for* T <sup>→</sup> U*.*

**Definition 2 (Approximation Relation).** *The approximation relation* <sup>T</sup> *is the type-indexed family of approximation candidates defined by induction on types, where base types are covered by their respective actions (Proposition 3), and* <sup>T</sup>→<sup>U</sup> <sup>=</sup> <sup>T</sup> <sup>→</sup> <sup>U</sup> *(Proposition 4).*

**Definition 3 (Environments).** *Given a typing context* Γ*, an* environment σ *for* Γ *is a substitution that maps each* x : T <sup>∈</sup> Γ *to a closed term of type* <sup>σ</sup>(x) : <sup>T</sup>*. If* <sup>σ</sup><sup>1</sup> *and* <sup>σ</sup><sup>2</sup> *are two such, we write* <sup>σ</sup><sup>1</sup> <sup>c</sup> <sup>Γ</sup> <sup>σ</sup><sup>2</sup> *to mean* <sup>σ</sup><sup>1</sup>(x) <sup>c</sup> <sup>T</sup> <sup>σ</sup><sup>2</sup>(x) *for all* x : T <sup>∈</sup> Γ*.*

**Proposition 5.** *For any* <sup>Γ</sup> <sup>t</sup> : <sup>T</sup> *and environments* <sup>σ</sup><sup>1</sup> <sup>c</sup> <sup>Γ</sup> <sup>σ</sup>2*,* <sup>t</sup>[σ<sup>1</sup>] <sup>c</sup> <sup>T</sup> <sup>t</sup>[σ<sup>2</sup>]*.*

**Corollary 1 (Adequacy).** *For every closed* t : T*,* t ⇑ <sup>=</sup><sup>⇒</sup> t <sup>≈</sup> Ω*.*

*Proof. Applying Proposition* <sup>5</sup> *to* t : T *(for the empty substitution), we conclude that* t <sup>c</sup> <sup>T</sup> <sup>t</sup>*; the definition of* (−)<sup>c</sup> *(Proposition* <sup>2</sup>*) asserts, then, that either* <sup>t</sup> <sup>≈</sup> <sup>Ω</sup> *or* (<sup>t</sup> ⇓ <sup>v</sup> and t <sup>T</sup> <sup>v</sup>)*; whereby if* <sup>t</sup> ⇑*, it can only be that* <sup>t</sup> <sup>≈</sup> <sup>Ω</sup>.

### **3 PCF with Sums**

*The Extension.* Sums provide a slight complication—but one which shows the adaptability of the method. The extension to call-by-name sums is presented in Table 3. With the new reduction rules come new β rules and divergence rules in the theory (Table 4). As before, reduction is deterministic and the theory is sound.

#### **3.1 Adequacy**

*Action.* The action for sums must reflect the structure of its parameters. That is for <sup>T</sup> we expect t <sup>T</sup> <sup>+</sup><sup>U</sup> **inl** <sup>u</sup> exactly when (modulo the theory) <sup>t</sup> decomposes into some **inl**t for which t <sup>T</sup> <sup>u</sup>. The assertion of that existence, though, causes us a small hiccup<sup>2</sup> in proving that <sup>−</sup> <sup>T</sup> <sup>+</sup><sup>U</sup> <sup>v</sup> is rationally admissible: If we have

<sup>2</sup> A hiccup that will be much amplified in the proof of admissibility for *F A* (Sect. 4).


$$\frac{\begin{array}{c} \begin{array}{c} \begin{array}{c} T \vdash t \mathrel{\mathop{:}} T \end{array} \\ \end{array} \begin{array}{c} T \vdash t \mathrel{\mathop{:}} U \end{array}}{\begin{array}{c} T \vdash t \mathrel{\mathop{:}} T \end{array} \begin{array}{c} T \vdash t \mathrel{\mathop{:}} U \end{array}} \begin{array}{c} T \vdash t \mathrel{\mathop{:}} U \end{array}$$
 
$$\begin{array}{c} \begin{array}{c} T \vdash t \mathrel{\mathop{:}} T \vdash T \end{array} \begin{array}{c} x \mathrel{\mathop{:}} T, F \vdash u \mathrel{\mathop{:}} U \quad y \mathrel{\mathop{:}} T', F \vdash q \mathrel{\mathop{:}} U \end{array} \end{array}$$

**Table 3.** Extension of PCF with binary sums


**Table 4.** Extension of the theory in Table 2 with binary sums

a series of <sup>C</sup>[**rec**<sup>n</sup> x.t] <sup>T</sup> <sup>+</sup><sup>U</sup> **inl** <sup>u</sup>, then we know that each of the terms on the left must be identical to some **inl**t<sup>n</sup> with <sup>t</sup><sup>n</sup> <sup>T</sup> <sup>u</sup>—but do the <sup>t</sup><sup>n</sup> form a rational chain? It turns out that for every t, simply from the existence of t <sup>≈</sup> **inl**t , and because each type is inhabited by Ω, there is a context that can extract directly the t (up to equivalence, obviously) from the original term. (An idea we borrowed from McCusker 1998)

**Lemma 1.** *The contexts*

T l [−] = **match** <sup>−</sup> **as** {**inl** x.x , **inr** y.Ω} <sup>T</sup> <sup>r</sup>[−] = **match** <sup>−</sup> **as** {**inl** x.Ω , **inr** y.y}

*satisfy* t <sup>≈</sup> **inl** u <sup>=</sup>⇒ T <sup>l</sup> [t] <sup>≈</sup> u *and* t <sup>≈</sup> **inr** u <sup>=</sup>⇒ T <sup>r</sup>[t] <sup>≈</sup> u*.*

**Proposition 6 (Sum Action).** *Given approximation candidates* <sup>T</sup> *for* <sup>T</sup> *and* <sup>U</sup> *for* <sup>U</sup>*, the relation between CTerms*<sup>T</sup> <sup>+</sup><sup>U</sup> *and NFs*<sup>T</sup> <sup>+</sup><sup>U</sup> *defined by*

> <sup>t</sup> <sup>T</sup> <sup>+</sup> <sup>U</sup> **inl** <sup>v</sup> ⇐⇒ (∃<sup>t</sup> c <sup>T</sup> u.t <sup>≈</sup> **inl**<sup>t</sup> )

<sup>t</sup> <sup>T</sup> <sup>+</sup> <sup>U</sup> **inr** <sup>v</sup> ⇐⇒ (∃<sup>t</sup> c <sup>U</sup> u.t <sup>≈</sup> **inr** <sup>t</sup> )

*is an approximation candidate for* A <sup>+</sup> B*.*

*Proof. For rational admissibility, the pre-condition must hold for (at least) one of the two clauses in the definition. Say we have* <sup>∃</sup>∞n.C[**rec**<sup>n</sup> x.t] <sup>T</sup> <sup>+</sup> <sup>U</sup> **inl** <sup>u</sup> *with each term on the left equivalent to some* **inl**tn*; rewriting* <sup>t</sup><sup>n</sup> <sup>≈</sup> T l [C[**rec**<sup>n</sup> x.t]] *(Lemma* <sup>1</sup>*) it follows that (Proposition* <sup>2</sup>*)*

$$C[\mathbf{rec}^n \, x \,. \, t] \approx \mathbf{in} \mathbf{l} \, T^l [C[\mathbf{rec}^n \, x \,. \, t]] \text{ and } \mathcal{T}^l [C[\mathbf{rec}^n \, x \,. \, t]] \, \sphericalangle\_T \, u$$

*An application of rational continuity of the theory, and one of rational admissibility of* <sup>c</sup> <sup>T</sup> *(again, Proposition* <sup>2</sup>*) yields* <sup>C</sup>[**rec** x.t] <sup>≈</sup> **inl** <sup>T</sup> <sup>l</sup> [C[**rec** x.t]] *and also* <sup>T</sup> <sup>l</sup> [C[**rec** x.t]] <sup>c</sup> <sup>T</sup> <sup>u</sup> *so that* <sup>C</sup>[**rec** x.t] <sup>T</sup> <sup>+</sup> <sup>U</sup> **inl** <sup>u</sup>. *(Likewise for the right injection.)*

*Adequacy.* The rest of the proof of adequacy follows exactly as before. Approximation candidates for sums are derived by induction using the sum action; and with them we can extend Proposition 5.

### **4 Call-by-Push-Value**

*Values vs. Computations.* We now turn to Call-by-push-value (Levy 2004). This language (Table 5) distinguishes between values and computations, with value types represented by A, A , etc., and computation types by B, B , etc. The set of closed values of type A will be represented by Vals<sup>A</sup>; that of closed computations by Comps<sup>B</sup>. Variables always have value type. Here we include value products and sums, products of computation types B <sup>Π</sup> <sup>B</sup> , types F A for computations aiming to return a value, and functions which in CBPV are computations taking values to computations. Central to CBPV, we also include value types UB of suspended computations of type B—which can be of one of two forms.

*Recursion.* In addition to the usual **thunk**s of computations, we also have recursively defined thunks **threc** x.t. An alternative would be to use recursive computations <sup>Γ</sup> <sup>c</sup> **rec** x.t : B. Although the two are equivalent via the definitions **rec** x.t := **force threc** x.t and **threc** x.t := **thunk rec** x.t, there are two reasons for preferring **threc**: One is that, in some denotational models (e.g. state or continuation passing), **threc** has a simpler denotation than **rec**. The other is that a treatment based on **threc** would be more easily adapted to call-by-value, where recursion and lambda are combined.

*Evaluation.* Evaluation (Table 6) pertains only to computations. To those on the co-domain side of the evaluation relation ⇓, we call the *terminal* computations or, alternatively, the normal forms; and their (typed-indexed) set is represented by NFs<sup>B</sup>. Since we have two forms of thunked computations, the action of forcing one such into execution much act accordingly; this *unthunk*ing (a derived operation on the syntax) returns the computations suspended inside **thunk**s, or plucks out the computation from a **threc** x.t suitably instantiated by the recursive thunk itself—i. e. t[**threc** x . t/x]. Note that reduction is deterministic.

**Table 5.** Call-by-push with recursion-value—syntax

$$A, A', \dots = 1 \mid A \times A' \mid 0 \mid A + A' \mid U\underline{B} \qquad \underline{B}, \underline{B'}, \dots = FA \mid A \to \underline{B} \mid 1\_{\Pi} \mid \underline{B} \text{ in } \underline{B'}$$


**Table 6.** Call-by-push-value with recursion—reduction


#### **4.1 Theory**

*Theory.* By a (CBPV) congruence on closed terms we mean a type-indexed equivalence relation ≈ on closed values and computations such that for all closed terms t <sup>≈</sup> t and (value or computation) context C[−] we have C[t] <sup>≈</sup> C[t ], respectively. A congruence is a *rationally continuous* β-Ω*-fix theory* when it satisfies the rules in Table 7. Rational chains are now those built by the application of a context C[−] to the (thunked) approximants **threc**<sup>n</sup> of recursive thunks and which are defined by the clauses **threc**<sup>0</sup> x.t <sup>=</sup> **thunk** <sup>Ω</sup> and **threc**n+1 x.t <sup>=</sup> **thunk** t[**threc**<sup>n</sup> x . t/x]; continuity is defined accordingly. Any congruence including the β and fixpoint rules is easily seen to be sound. We shall show that with the remaining rules it is also adequate.

**Table 7.** Call-by-push-value with recursion—rationally continuous β-Ω-fix theory


### **4.2 Adequacy**

*Values: Empty Shells.* In the proof of adequacy for PCF with sums we were required to introduce the tests so that we could, metaphorically, peek inside the injections and transform the rational chains there into equivalent ones with the properties we needed (cf. proof of Proposition 6). Here the problem expands to *all value types*. When checking rational admissibility, we need to decompose a value into its *ultimate pattern* and its constituent thunks (Lassen and Levy 2008, following ideas from Abramsky and McCusker 1997; also discernible in the work of Zeilberger 2008) and use those to find equivalent chains that can be used to establish adequacy.

**Definition 4 (Ultimate Patterns).** *The set of of ultimate patterns UP* <sup>A</sup> *for a value type* <sup>A</sup> *is given by induction on the following rules:* <sup>−</sup>UB <sup>∈</sup> *UP*UB*,* <sup>∈</sup> *UP*<sup>1</sup> *and*

$$\frac{p \in UP^A \quad p' \in UP^{A'}}{\langle p, p' \rangle \in UP^{A \times A'}} \quad \frac{p \in UP^A}{\text{in1} \, p \in UP^{A+A'}} \quad \frac{p \in UP^{A'}}{\text{inr} \, p \in UP^{A+A'}}$$

*For a given ultimate pattern* p <sup>∈</sup> *UP* <sup>A</sup> *the finite sequence of hole-types in pattern* p *is given by induction by*

$$\begin{aligned} H(-U\underline{B}) &= (U\underline{B}) & H(\langle \rangle) &= \epsilon & H(\langle p, p' \rangle) &= H(p) \mp H(p')\\ H(\text{in1}\, p) &= H(p) & H(\text{in}\, p) &= H(p) \end{aligned}$$

**Proposition 7 (Value Decomposition).** *Given* <sup>v</sup> v : A*, there is a unique* <sup>p</sup> <sup>∈</sup> *UP* <sup>A</sup> *and a unique sequence* (<sup>v</sup> <sup>v</sup><sup>i</sup> : <sup>H</sup>(p)i)i<|H(p)|*—the filling—for which* v <sup>=</sup> p @ (v<sup>i</sup>)i<|H(p)|*, using the reassembly function*

$$\begin{array}{c} \left( \begin{array}{l} (-{}\_{U\underline{B}}) \otimes (v) = v \end{array} \right) \left\langle \begin{array}{l} \otimes \epsilon = \langle \rangle \\ \mathsf{in} \, p \otimes (v\_{i})\_{i < \langle H(p) \rangle} = \mathsf{in} \, \mathsf{l} (p \otimes (v\_{i})\_{i < \langle H(p) \rangle}) \\ \mathsf{in} \, p \otimes (v\_{i})\_{i < \langle H(p) \rangle} = \mathsf{in} \, p \otimes (v\_{i})\_{i < \langle H(p) \rangle} \end{array} \\ \left( \begin{array}{l} \langle p, p' \rangle \oplus \left( (v\_{i})\_{i < \langle H(p) \rangle} + (v'\_{i})\_{i < \langle H(p') \rangle} \right) = \left\langle (p \otimes (v\_{i})\_{i < \langle H(p) \rangle}), (p' \otimes (v'\_{i})\_{i < \langle H(p') \rangle}) \right\rangle \right) \end{array} \right) \end{array}$$

*Tests.* Ultimate patterns let us define the tests that extract the computations embedded in a given value. Like in the PCF sum case, we can use them to define values that are equivalent to a given one but make use only of the latter. If the values are derived from some family of contexts for the holes, then we can derive an equivalent context from the respective ultimate pattern.

**Definition 5.** *For* <sup>p</sup> <sup>∈</sup> *UP* <sup>A</sup>*, and* i < <sup>|</sup>H(p)|*, we define a context* <sup>T</sup> <sup>p</sup> <sup>i</sup> [−] *by induction on* p <sup>∈</sup> *UP* <sup>A</sup> *using the rules below. Note that when* <sup>Γ</sup> <sup>v</sup> <sup>−</sup> : <sup>A</sup> *the test has type* <sup>Γ</sup> <sup>c</sup> <sup>T</sup> <sup>p</sup> <sup>i</sup> [−] : <sup>B</sup><sup>i</sup> *where* UB<sup>i</sup> <sup>=</sup> <sup>H</sup>(p)i*.*

$$\begin{aligned} \mathcal{T}\_0^{-\mathit{UB}}[-] &= \text{force}-\\ \mathcal{T}\_i^{\mathit{ind}}[-] &= \text{match}-\text{ as }\{\mathit{in}\,\boldsymbol{x}.\,\boldsymbol{T}\_i^p[\boldsymbol{x}],\,\boldsymbol{\text{in}\,\boldsymbol{y}.\,\Omega}\} \\ \mathcal{T}\_i^{\mathit{in}\,\boldsymbol{r}\,p}[-] &= \text{match}-\text{ as }\{\mathit{in}\,\boldsymbol{x}.\,\Omega,\,\boldsymbol{\text{in}\,\boldsymbol{y}.\,\boldsymbol{T}\_i^p[\boldsymbol{y}]\}\} \\ \mathcal{T}\_{i<|H(p)|}^{\langle\boldsymbol{p},\boldsymbol{p'}\rangle}[-] &= \text{match}-\text{as}<\boldsymbol{x},\boldsymbol{y}>\boldsymbol{.}\,\boldsymbol{T}\_i^p[\boldsymbol{x}] \\ \mathcal{T}\_{i=|H(p)|+i^\prime}^{\langle\boldsymbol{p},\boldsymbol{p'}\rangle}[-] &= \text{match}-\text{as}<\boldsymbol{x},\boldsymbol{y}>\boldsymbol{.}\,\boldsymbol{T}\_i^{p'}[\boldsymbol{y}] \end{aligned}$$

**Proposition 8 (Tests Decompose).** *Given a pattern* p <sup>∈</sup> *UP* <sup>A</sup>*, a sequence* ( <sup>w</sup><sup>i</sup> : <sup>H</sup>(p)i)i<|H(p)|*, and* i < <sup>|</sup>H(p)|*, we have* <sup>T</sup> <sup>p</sup> <sup>i</sup> [<sup>p</sup> @ (wi)i<|H(p)<sup>|</sup> ] <sup>≈</sup> **force** wi*.*

**Proposition 9.** *For* <sup>c</sup> <sup>t</sup> : F A *, and* <sup>p</sup> <sup>∈</sup> *UP* <sup>A</sup>*, if* <sup>t</sup> <sup>≈</sup> **return** <sup>p</sup> @ (vi)i<|H(p)<sup>|</sup> *then, successively:*

*1.* <sup>∀</sup>i < <sup>|</sup>H(p)|. **thunk**(<sup>t</sup> **to** x. <sup>T</sup> <sup>p</sup> <sup>i</sup> [x]) <sup>≈</sup> <sup>v</sup><sup>i</sup>

*2.* <sup>p</sup> @ (vi)i<|H(p)<sup>|</sup> <sup>≈</sup> <sup>p</sup> @ (**thunk**(<sup>t</sup> **to** x. <sup>T</sup> <sup>p</sup> <sup>i</sup> [x]))i<|H(p)<sup>|</sup>

*3.* <sup>t</sup> <sup>≈</sup> **return** <sup>p</sup> @ (**thunk**(<sup>t</sup> **to** x. <sup>T</sup> <sup>p</sup> <sup>i</sup> [x]))i<|H(p)<sup>|</sup>

*Approximation Candidates.* Unlike PCF where we have computations and normal forms, CBPV has three levels of syntax: values, terminals, and computations. For the purposes of defining the needed approximation candidates, terminals (read: normal forms) and computations, behave like their PCF counterparts and have (now) familiar definitions of approximation candidates. Approximation candidates for value types enforce that: only structurally similar values are related; that they are (left) closed under equivalence of their holes; and that they are closed under the usual chains.

**Definition 6 (Approximation Candidates).** *Given a value type* A*, an approximation candidate for* A *is a subset of Vals*<sup>A</sup> <sup>×</sup> *Vals*<sup>A</sup> *such that*


(∀i < <sup>|</sup>H(p)|.v<sup>i</sup> <sup>≈</sup> <sup>v</sup> <sup>i</sup>) =<sup>⇒</sup> <sup>p</sup> @ (v<sup>i</sup>)i<|H(p)<sup>|</sup> p @ (w<sup>i</sup>)i<|H(p)<sup>|</sup>

*3. Rational Admissibility: for* x : UB <sup>c</sup> t : B

$$(\exists^\infty n. V[\mathtt{threc}^n x.t] \lhd w) \implies V[\mathtt{threc} x.t] \lhd w$$

*Given a computation type* B*, an approximation candidate for* <sup>B</sup> *is a subset of Comps*<sup>B</sup> <sup>×</sup> *NF* <sup>B</sup> *such that*


$$\exists \exists^{\infty} n. C[\mathtt{threc}^{n} x. t] \lhd r \implies C[\mathtt{threc} \, x. t] \lhd r$$

**Proposition 10.** *Given a (computation) approximation candidate on* B*, define its closure as the binary relation Comps*<sup>B</sup> <sup>×</sup> *Comps*<sup>B</sup> *where*

t <sup>c</sup> u ⇐⇒ t <sup>≈</sup> Ω *or* (∃r.u ⇓ r *and* tr)

*It satisfies the following properties:*


$$(\exists^{\infty} n. C[\mathtt{threc}^{n} x. t] \lhd^{c} u) \implies C[\mathtt{threc} x. t] \lhd^{c} u)$$

*Actions.* We can then define the actions on these approximation candidates associated with each type constructor. Mostly this is done by structure (for values) or by use (for computations); the exceptions are U types and F types that we define, respectively, by structure, and by use. Note that it is the existential quantification in the definition of the F action that—very much like PCF sums requires the use of the tests. Using them, we can easily define, by induction, the approximation relation and thereby establish the adequacy of the theory.

**Proposition 11 (Thunk Action).** *Let be an approximation candidate for* B*. Then the binary relation*

<sup>v</sup> <sup>U</sup>() <sup>w</sup> ⇐⇒ **force** v <sup>c</sup> *unthunk* w

*is an approximation candidate for* UB*.*

**Proposition 12 (F Action).** *Let be an approximation candidate for* A*. Then the following is an approximation candidate for* F A*:*

t F() **return** w ⇐⇒ ∃v w.t <sup>≈</sup> **return** v

**Definition 7 (Enviroments).** *Given a typing context* Γ*, an* environment σ *for* <sup>Γ</sup> *is a substitution that maps each* <sup>x</sup> : <sup>A</sup> <sup>∈</sup> <sup>Γ</sup> *to a closed term of type* <sup>v</sup> σ(x) : A*. If* <sup>σ</sup><sup>1</sup> *and* <sup>σ</sup><sup>2</sup> *are two such, we write* <sup>σ</sup><sup>1</sup> <sup>Γ</sup> <sup>σ</sup><sup>2</sup> *to mean* <sup>σ</sup><sup>1</sup>(x) <sup>A</sup> <sup>σ</sup><sup>2</sup>(x) *for all* x : A <sup>∈</sup> Γ*.*

**Proposition 13.** *For any* <sup>Γ</sup> <sup>c</sup> t : B *(resp.* Γ <sup>v</sup> v : A*), and environments* <sup>σ</sup><sup>1</sup> <sup>Γ</sup> <sup>σ</sup><sup>2</sup> *we have* <sup>t</sup>[σ<sup>1</sup>] <sup>c</sup> <sup>B</sup> <sup>t</sup>[σ<sup>2</sup>] *(resp.* <sup>v</sup>[σ<sup>1</sup>] <sup>A</sup> <sup>v</sup>[σ<sup>2</sup>]*).*

**Corollary 2 (Adequacy).** *For any computation* <sup>c</sup> t : B*, if* t ⇑ *then* t <sup>≈</sup> Ω*.*

### **5 Polymorphic Call-by-Push-Value**

*Adequacy, Now For All.* Our final extension deals with polymorphism. In Callby-push-value, polymorphic types are computation types. We may quantify over both value and computation types. The extension is presented in Table 8.

We assume two disjoint countable sets of variables, X, Y, . . . <sup>∈</sup> VVars and X, Y ,... <sup>∈</sup> CVars, for value and computation types (resp.). Types are now also considered up to α-equivalence. They will also be considered under context, Θ <sup>C</sup> <sup>B</sup> and <sup>Θ</sup> <sup>V</sup> <sup>A</sup>, where <sup>Θ</sup> is some finite subset of VVars <sup>∪</sup> CVars that includes the free type variables of the A or B. (These type judgements have an obvious inductive definition). The proper extension of a type context Θ by a type variable χ will be denoted by χ, Θ. Typing judgements also need to be annotated by a type context, as in Θ; Γ <sup>c</sup> <sup>t</sup> : <sup>B</sup> where <sup>Θ</sup> includes all the free type variables in the types of Γ and B. The previous typing rules are extended in the evident way.


**Table 8.** Polymorphic Call-by-push-value with recursion

**Table 9.** Extension of the theory in Table 7 to polymorphism


*Reduction and Theory.* Reduction—defined only for closed terms of closed type—is still deterministic. On the theory end of things, we equate only closed terms of closed type so that we need only extend the theory of Sect. 4 with the obvious β and divergence rules (Table 9). Unsurprisingly, soundness still stands.

#### **5.1 Adequacy**

*Approximation Candidates and Actions.* Throughout we have worked with approximation candidates—and now we can reap the fruits of that work. The definition of approximation candidates (Definition 6) and of their extension to computations (Proposition 10) can stay exactly the same; as can the actions for non-polymorphic type constructors. The actions of polymorphic types follow.

**Proposition 14.** *Let* Y <sup>C</sup> <sup>B</sup> *be a computation type, and* <sup>φ</sup> *a mapping that assigns to every closed type* T *and approximation candidate* <sup>∈</sup> *ACs*<sup>T</sup> *an approximation candidate* <sup>φ</sup>T , <sup>∈</sup> *ACs*<sup>B</sup>[T/Y ] *; then*

$$t\left<\prod Y.\phi\right>\Lambda Y.u \iff for\ all\ \vdash^C T, \Diamond \in A\operatorname{Cs}^T.\operatorname{tT}\left<\phi\_{T,\Diamond}\right>^c u[T/Y]$$

*is an approximation candidate for* Y.B*—and likewise for* Y .B *Approximations.* The approximation relations need to be parametrized by the candidates that will instantiate the type variables so that in the end we arrive at a candidate for a closed type. As usual, we have that it satisfies the weakening and substitution properties that are used in the proof of adequacy for abstractions and type instantiations, respectively.

**Definition 8 (Approximation Environment).** *An* approximation environment γ *for* Θ *is a map taking each* χ <sup>∈</sup> Θ *to a closed type* γ<sup>T</sup> (χ) *of the same kind as* <sup>χ</sup> *and an adequacy candidate* <sup>γ</sup><sup>C</sup> (χ) <sup>∈</sup> *ACs*<sup>γ</sup><sup>T</sup> (χ) *.*

**Definition 9 (Parametrized Approximation Relations).** *Let* <sup>Θ</sup> <sup>V</sup> A *(resp.* Θ <sup>C</sup> <sup>B</sup>*) be a (possibly open) type and* <sup>γ</sup> *an approximation environment for* Θ*. The following* parametrized approximation relations*, defined by induction on types, determine an approximation candidate for* A[γ<sup>T</sup> ]*—i.e.* <sup>A</sup> *with each type variable* χ *replaced with* γ<sup>T</sup> (χ) *(resp.* <sup>B</sup>[γ<sup>T</sup> ]*).*

γ <sup>Θ</sup><sup>V</sup> <sup>X</sup> <sup>=</sup> <sup>γ</sup><sup>C</sup> (X) <sup>γ</sup> <sup>Θ</sup><sup>C</sup> <sup>X</sup> <sup>=</sup> <sup>γ</sup><sup>C</sup> (X) γ <sup>Θ</sup><sup>V</sup> <sup>1</sup> <sup>=</sup><sup>1</sup> <sup>γ</sup> Θ<sup>V</sup> A×A- = (<sup>γ</sup> <sup>Θ</sup><sup>V</sup> <sup>A</sup>) <sup>×</sup> (<sup>γ</sup> Θ<sup>V</sup> A- ) γ <sup>Θ</sup><sup>V</sup> <sup>0</sup> <sup>=</sup><sup>0</sup> <sup>γ</sup> Θ<sup>V</sup> A+A- = (<sup>γ</sup> <sup>Θ</sup><sup>V</sup> <sup>A</sup>)+(<sup>γ</sup> Θ<sup>V</sup> A- ) γ <sup>Θ</sup><sup>V</sup> UB <sup>=</sup> <sup>U</sup>(<sup>γ</sup> <sup>Θ</sup><sup>C</sup> <sup>B</sup>) <sup>γ</sup> <sup>Θ</sup><sup>C</sup> F A <sup>=</sup> <sup>F</sup>(<sup>γ</sup> <sup>Θ</sup><sup>V</sup> <sup>A</sup>) γ Θ<sup>C</sup> 1<sup>Π</sup> = (1<sup>Π</sup> ) <sup>γ</sup> Θ<sup>C</sup> BΠB- = (<sup>γ</sup> <sup>Θ</sup><sup>C</sup> <sup>B</sup>) <sup>Π</sup> (<sup>γ</sup> Θ<sup>C</sup> B- ) γ <sup>Θ</sup><sup>C</sup> <sup>A</sup>→<sup>B</sup> = (<sup>γ</sup> <sup>Θ</sup><sup>V</sup> <sup>A</sup>) <sup>→</sup> (<sup>γ</sup> <sup>Θ</sup><sup>C</sup> <sup>B</sup>) γ ΘC- Y.B <sup>=</sup> Y. γ[Y →(−,=)] Y,Θ<sup>C</sup> <sup>B</sup> } γ ΘC- <sup>Y</sup> .B <sup>=</sup> <sup>Y</sup> . γ[Y →(−,=)] <sup>Y</sup> ,Θ<sup>C</sup> <sup>B</sup> } 

**Definition 10.** *For any* <sup>Θ</sup> *and approximation environment* <sup>γ</sup> *for* <sup>Θ</sup>*, if* <sup>σ</sup><sup>1</sup> *and* <sup>σ</sup><sup>2</sup> *are environments for* <sup>Γ</sup>[γ<sup>T</sup> ]*, we write* <sup>σ</sup><sup>1</sup> <sup>γ</sup> <sup>Θ</sup>;<sup>Γ</sup> <sup>σ</sup><sup>2</sup> *to mean* <sup>σ</sup><sup>1</sup>(x) <sup>γ</sup> <sup>Θ</sup><sup>V</sup> <sup>A</sup> <sup>σ</sup><sup>2</sup>(x) *for every* x : A <sup>∈</sup> Γ*.*

**Proposition 15.** *For any* <sup>Θ</sup>; <sup>Γ</sup> <sup>c</sup> t : B *(resp.* Θ, Γ <sup>v</sup> v : A*), approximation environment* <sup>γ</sup> *for* <sup>Θ</sup>*, and environments* <sup>σ</sup><sup>1</sup> <sup>γ</sup> <sup>Θ</sup>;<sup>Γ</sup> <sup>σ</sup><sup>2</sup> *for* <sup>Γ</sup>

$$t[\gamma^T][\sigma\_1] \left\langle \operatorname{\boldsymbol{\varsigma}}\_{\Theta^{\mathsf{H}^\*C}\underline{B}}^{\gamma} \right\rangle^c t[\gamma^T][\sigma\_2] \qquad \left(resp. \; v[\gamma^T][\sigma\_1] \left\langle \operatorname{\boldsymbol{\varsigma}}\_{\Theta^{\mathsf{H}^\*A}}^{\gamma} \right\rangle v[\gamma^T][\sigma\_2] \right)$$

### **6 Concluding Remarks**

We have thus seen how, for term-level recursion, the rational continuity rule coupled with β, the fixpoint property of recursion, and strictness of the basic constructors of the language suffices to make a theory adequate. The recipe of the previous sections applies to both call-by-name and call-by-value languages and is compatible with polymorphic types. Along the way we used no category theory; no models were mentioned. We relied only on syntactic constructions and required no external machinery.

Two extensions are conspicuous for their absence: to existential types and to recursive types. In Call-by-push-value, existential types are value types. We conjecture our theorem holds for them but we must find a way to quantify over ultimate patterns. For recursive types, even finding suitable conditions on ≈ is challenging. We would like to adapt Pitts' (1996) method of minimal invariant relations but we will need type constructors to be functorial over suitable syntactic categories.

For term-recursion and polymorphism, however, we now know that to prove a model adequate we need only to show that it satisfies the basic laws and rational continuity.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Linearity

### A New Linear Logic for Deadlock-Free Session-Typed Processes

Ornela Dardha(B) and Simon J. Gay

School of Computing Science, University of Glasgow, Glasgow, UK {Ornela.Dardha,Simon.Gay}@glasgow.ac.uk

Abstract. The π-calculus, viewed as a core concurrent programming language, has been used as the target of much research on type systems for concurrency. In this paper we propose a new type system for deadlockfree session-typed π-calculus processes, by integrating two separate lines of work. The first is the propositions-as-types approach by Caires and Pfenning, which provides a linear logic foundation for session types and guarantees deadlock-freedom by forbidding cyclic process connections. The second is Kobayashi's approach in which types are annotated with priorities so that the type system can check whether or not processes contain genuine cyclic dependencies between communication operations. We combine these two techniques for the first time, and define a new and more expressive variant of classical linear logic with a proof assignment that gives a session type system with Kobayashi-style priorities. This can be seen in three ways: (i) as a new linear logic in which cyclic structures can be derived and a Cycle-elimination theorem generalises Cut-elimination; (ii) as a logically-based session type system, which is more expressive than Caires and Pfenning's; (iii) as a logical foundation for Kobayashi's system, bringing it into the sphere of the propositionsas-types paradigm.

### 1 Introduction

The Curry-Howard correspondence, or propositions-as-types paradigm, provides a canonical logical foundation for functional programming [42]. It identifies types with logical propositions, programs with proofs, and computation with proof normalisation. It was natural to ask for a similar account of concurrent programming, and this question was brought into focus by the discovery of linear logic [24] and Girard's explicit suggestion that it should have some connection with concurrent computation. Several attempts were made to relate π-calculus processes to the proof nets of classical linear logic [1,8], and to relate CCS-like processes to the ∗-autonomous categories that provide semantics for classical linear logic [2]. However, this work did not result in a convincing propositionsas-types framework for concurrency, and did not continue beyond the 1990s.

c The Author(s) 2018

Supported by the UK EPSRC grant EP/K034413/1, "From Data Types to Session Types: A Basis for Concurrency and Distribution (ABCD)", and by COST Action IC1201, "Behavioural Types for Reliable Large-Scale Software Systems (BETTY)".

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 91–109, 2018. https://doi.org/10.1007/978-3-319-89366-2\_5

Fig. 1. Cyclic scheduler

Meanwhile, Honda *et al.* [26,27,38] developed *session types* as a formalism for statically checking that messages have the correct types and sequence according to a communication protocol. Research on session types developed and matured over several years, eventually inspiring Caires and Pfenning [12] to discover a Curry-Howard correspondence between dual intuitionistic linear logic [7] and a form of π-calculus with session types [38]. Wadler [41] subsequently gave an alternative formulation based on classical linear logic, and related it to existing work on session types for functional languages [23]. The Caires-Pfenning approach has been widely accepted as a propositions-as-types theory of concurrent programming, as well as providing a logical foundation for session types.

Caires and Pfenning's type system guarantees deadlock-freedom by forbidding cyclic process structures. It provides a logical foundation for deadlock-free session processes, complementing previous approaches to deadlock-freedom in session type systems [9,15,21,22]. The logical approach to session types has been extended in many ways, including features such as dependent types [39], failures and non-determinism [11], sharing and races [6]. All this work relies on the acyclicity condition. However, rejecting cyclic process structures is unnecessarily strict: they are a necessary, but not sufficient, condition for the existence of deadlocked communication operations. As we will show in Example 1 (Fig. 1), there are deadlock-free processes that can naturally be implemented in a cyclic way, but are rejected by Caires and Pfenning's type system.

Our contribution is to define a new logic, *priority-based linear logic* (PLL), and formulate it as a type system for *priority-based* CP (PCP), which is a more expressive class of processes than Wadler's CP [41]. This is the first Curry-Howard correspondence that allows cyclic interconnected processes, while still ensuring deadlock-freedom. The key idea is that PLL includes conditions on inter-channel dependencies based on Kobayashi's type systems [29,30,32]. Our work can be viewed in three ways: (i) as a new linear logic in which cyclic proof structures can be derived; (ii) as an extension of Caires-Pfenning type systems so that they accept more processes, while maintaining the strong logical foundation; (iii) as a logical foundation for Kobayashi-style type systems.

An example of a deadlock-free cyclic process is Milner's well-known scheduler [35], described in the following Example 1.

*Example 1 (Cyclic Scheduler,* Fig. 1*).* A set of agents A0, ..., An−1, for n > 1, is scheduled to perform a certain task in cyclic order, starting with agent A0. For all i ∈ {1, ..., n − 1}, agent A<sup>i</sup> sends the result of computation to a collector process Pi, before transmitting further data to agent A(i+1) mod <sup>n</sup>. At the end of the round, A<sup>0</sup> sends the final result to P0. Here we define a finite version of Milner's scheduler, which executes one round of communication.

$$\begin{array}{llll} Shed & \triangleq ... (\nu a\_i b\_i)...(\nu c\_i d\_{(i+1)\bmod n}) \big( A\_0 \mid A\_1 \mid \ldots \mid A\_{n-1} \mid P\_0 \mid P\_1 \mid \ldots \mid P\_{n-1} \end{array}$$

$$\begin{array}{llll} A\_0 & \triangleq c\_0[\mathbf{n}\_0].d\_0(x\_0).a\_0[\mathbf{m}\_0].\mathsf{close}\_0\\ A\_i & \triangleq d\_i(x\_i).a\_i[\mathbf{m}\_i].c\_i[\mathbf{n}\_i].\mathsf{close}\_i & i \in \{1, ..., n-1\}\\ P\_i & \triangleq b\_i(y\_i).Q\_i & i \in \{0, ..., n-1\} \end{array}$$

Prefix c0[**n**0] denotes an output on c0, and d0(x0) an input on d0. For now, let **m** and **n** denote data. Process close<sup>i</sup> closes the channels used by Ai: the details of this closure are irrelevant here (however, they are as in processes Q and R in Example 2). Process Q<sup>i</sup> uses the message received from Ai, in internal computation. The construct (*ν*ab) creates two channel endpoints a and b and binds them together. The system *Sched* is deadlock-free because A1, ..., A<sup>n</sup>−<sup>1</sup> each wait for a message from the previous A<sup>i</sup> before sending, and A<sup>0</sup> sends the initial message.

*Sched* is not typable in the original type systems by Caires-Pfenning and Wadler. To do that, it would be necessary to break A<sup>0</sup> into two parallel agents A <sup>0</sup> c0[**n**0].close<sup>c</sup><sup>0</sup> and A <sup>0</sup> d0(x0).a0[**m**0].close<sup>d</sup>0,a<sup>0</sup> . This changes the design of the system, yielding a different one. Moreover, if the scheduler continues into a second round of communication, this redesign is not possible because of the potential dependency from the input on d<sup>0</sup> to the next output on c0. However, *Sched* is typable in PCP; we will show the type assignment at the end of Sect. 2.

There is a natural question at this point: *given that the cyclic scheduler is deadlock-free, is it possible to encode its semantics in* CP*, thus eliminating the need for* PCP? It is possible to define a centralised agent <sup>A</sup> that communicates with all the collectors Pi, resulting in a system that is semantically equivalent to our *Sched*. However, such an encoding has a global character, and changes the structure of the overall system from distributed to centralised. In programming terms, it corresponds to changing the software design, as we pointed out in Example 1, and ultimately the software architecture, which is not always desirable or even feasible. The aim of PCP is to generalise CP so that deadlock-free processes can be constructed with their natural structure. We would want any encoding of PCP into CP to be structure-preserving, which would mean translating the Cycle rule (given in Fig. 2) homomorphically; this is clearly impossible.

Contributions and Structure of the Paper. In Sect. 2 we define prioritybased linear logic (PLL), which extends classical linear logic (CLL) with priorities attached to propositions. These priorities are based on Kobayashi's annotations for deadlock freedom [32]. By following the propositions-as-types paradigm, we define a term assignment for PLL proofs, resulting in priority-based classical processes (PCP), which extends Wadler's CP [41] with Mix and Cycle rules (Fig. 2). In Sect. <sup>3</sup> we define an operational semantics for PCP. In Sect. <sup>4</sup> we prove Cycle-elimination (Theorem 1) for PLL, analogous to the standard Cut-elimination theorem for CLL. Consequently, the results for PCP are subject reduction (Theorem 2), top-level deadlock-freedom (Theorem 3), and full deadlock-freedom for closed processes (Theorem 4). In Sect. 5 we discuss related work and conclude the paper.

### 2 PCP: Classical Processes with Mix and Cycle

Priority-based CP (PCP) follows the style of Wadler's Classical Processes (CP) [41], with details inspired by Carbone *et al.* [14] and Caires and Pérez [11].

Types. We start with types, which are based on CLL propositions. Let A, B range over types, given in Definition 1. Let <sup>o</sup>, κ <sup>∈</sup> <sup>N</sup> ∪ {ω} range over *priorities*, which are used to annotate types. Let ω be a special element such that o < ω for all <sup>o</sup> <sup>∈</sup> <sup>N</sup>. Often, we will omit <sup>ω</sup>. We will explain priorities later in this section.

Definition 1 (Types). *Types (*A, B*) are given by:*

A, B :: = <sup>⊥</sup><sup>o</sup> *<sup>|</sup>* **<sup>1</sup>**<sup>o</sup> *<sup>|</sup>* <sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup> *<sup>|</sup>* <sup>A</sup> <sup>o</sup> <sup>B</sup> *<sup>|</sup>* <sup>⊕</sup><sup>o</sup> {l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> *<sup>|</sup>* &<sup>o</sup>{l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> *<sup>|</sup>* ?<sup>o</sup> <sup>A</sup> *<sup>|</sup>* ! <sup>o</sup> A

<sup>⊥</sup><sup>o</sup> and **<sup>1</sup>**<sup>o</sup> are associated with channel endpoints that are ready to be closed. <sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup> (respectively, <sup>A</sup> <sup>o</sup> B) is associated with a channel endpoint that first outputs (respectively, inputs) a channel of type A and then proceeds as B. <sup>⊕</sup><sup>o</sup>{l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> is associated with a channel endpoint over which we can select a label from {l<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> , and proceed as <sup>A</sup>i. Dually, &<sup>o</sup>{l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> is associated with a channel endpoint that can offer a set of labelled types. ?<sup>o</sup> A types a collection of clients requesting A. Dually, ! <sup>o</sup> A types a server repeatedly accepting A.

Duality on types is total and is given in Definition 2. It preserves priorities of types.

Definition 2 (Duality). *The* duality *function* (·)<sup>⊥</sup> *on types is given by:*

$$\begin{array}{c} \left(A \lhd^{\bullet} B\right)^{\perp} = A^{\perp} \lhd^{\bullet} B^{\perp} \qquad \left(\begin{array}{c} \mathbb{L}^{\bullet}\right)^{\perp} = \mathbf{1}^{\bullet} \\ \left(A \otimes^{\bullet} B\right)^{\perp} = A^{\perp} \lhd^{\bullet} B^{\perp} \qquad \left(\begin{array}{c} \mathbf{1}^{\bullet}\right)^{\perp} = \mathbf{1}^{\bullet} \end{array} \\ \left(\&^{\bullet}\{l\_{i}:A\_{i}\}\_{i \in I}\right)^{\perp} = \oplus^{\bullet}\{l\_{i}:A\_{i}^{\perp}\}\_{i \in I} \ \hspace{1cm} \ \mathtt{?}^{\bullet} A^{\perp} = \mathtt{?}^{\bullet} A^{\perp} \\ \left(\oplus^{\bullet}\{l\_{i}:A\_{i}\}\_{i \in I}\right)^{\perp} = \&^{\bullet}\{l\_{i}:A\_{i}^{\perp}\}\_{i \in I} \ \hspace{1cm} \ \mathtt{?}^{\bullet} A^{\perp} = \mathtt{?}^{\bullet} A^{\perp} \end{array}$$

Processes. Let P, Q range over processes, given in Definition 3. Let x, y range over channel endpoints, and **<sup>m</sup>**, **<sup>n</sup>** over channel endpoints of type either <sup>⊥</sup><sup>o</sup> or **<sup>1</sup>**<sup>o</sup>. Definition 3 (Processes). *Processes (*P, Q*) are given by:*

$$\begin{array}{ccc} P,Q ::= x[y].P & (output) & \mathbf{0} & (inaction) \\ x(y).P & (input) & P \mid Q & (composition) \\ x \lhd\_j.P & (selection) & (\nu x^A y) P \mid (SSson construction) \\ x \rhd \{l\_i : P\_i\}\_{i \in I} & (branching) & x[].\mathbf{0} & (empty output) \\ x \to y^A & (forwarding) & x().P & (empty input) \\ \end{array}$$

Process x[y].P (respectively, x(y).P) outputs (respectively, inputs) y on channel endpoint x, and proceeds as P. Process xl<sup>j</sup> .P uses x to select l<sup>j</sup> from a labelled choice process, typically being x {l<sup>i</sup> : P<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> , and triggers P<sup>j</sup> ; labels indexed by the finite set <sup>I</sup> are pairwise distinct. Process <sup>x</sup> <sup>→</sup> <sup>y</sup><sup>A</sup> forwards communications from x to y, the latter having type A. Processes also include the inaction process **0**, the parallel composition of P and Q, denoted P *|* Q, and the double restriction constructor (*ν*x<sup>A</sup>y)P: the intention is that x and y denote dual session channel endpoints in P, and A is the type of x. Processes x[ ].**0** and x().P are the empty output and empty input, respectively. They denote the closure of a session from the viewpoint of each of the two communicating participants.

Notions of bound/free names in processes are standard; we write fn(P) to denote the set of free names of <sup>P</sup>. Also, we write <sup>P</sup>{x/z} to denote the (captureavoiding) substitution of x for the free occurrences of z in P. Finally, we let x˜, which is different from x, denote a sequence x1,...,x<sup>n</sup> for n > 0.

Typing Rules. Typing contexts, ranged over by Γ, Δ, Θ, are sets of typing assumptions x:A. We write Γ,Δ for union, requiring the contexts to be disjoint. A typing judgement P Γ means "process P is well typed using context Γ".

Before presenting the typing rules, we need some auxiliary definitions. Our priorities are based on the annotations used by Kobayashi [32], but simplified to single priorities *à la* Padovani [37]. They obey the following laws:


Definition 4 (Priority). *The* priority *function* pr(·) *on types is given by:*

$$\begin{array}{cc} \mathsf{pr}(A \upharpoonright \mathcal{C} \, B) = \mathsf{pr}(A \upharpoonright \mathcal{B}) = \mathsf{o} & \mathsf{pr}(\perp^{\mathsf{o}}) = \mathsf{pr}(\mathbf{1}^{\mathsf{o}}) = \mathsf{o} \\ \mathsf{pr}(\oplus^{\mathsf{o}}\{l\_{i} : A\_{i}\}\_{i \in I}) = \mathsf{pr}(\&^{\mathsf{o}}\{l\_{i} : A\_{i}\}\_{i \in I}) = \mathsf{o} & \mathsf{pr}(\, ?^{\mathsf{o}} \, A) = \mathsf{pr}(\, !^{\mathsf{o}} \, A) = \mathsf{o} \end{array}$$

Definition 5 (Lift). *Let* <sup>t</sup> <sup>∈</sup> <sup>N</sup>*. The* lift *operator* <sup>↑</sup><sup>t</sup> (·) *on types is given by:*

$$\begin{array}{ccc}\uparrow^{t}(A\upharpoonright^{\bullet}B)=(\uparrow^{t}A)\otimes^{(\bullet+t)}(\uparrow^{t}B) & \uparrow^{t}\bot^{\bullet}=\mathbf{1}^{(\bullet+t)}\\\uparrow^{t}(A\otimes^{\bullet}B)=(\uparrow^{t}A)\otimes^{(\bullet+t)}(\uparrow^{t}B) & \uparrow^{t}\mathbf{1}^{\bullet}=\bot^{(\bullet+t)}\\\uparrow^{t}(\&^{\bullet}\{l\_{i}:A\_{i}\}\_{i\in I})=\&^{(\bullet+t)}\{l\_{i}:\uparrow^{t}A\_{i}\}\_{i\in I}\uparrow^{t}(\uparrow^{\bullet}A)=\uparrow^{(\bullet+t)}(\uparrow^{t}A)\\\uparrow^{t}(\oplus^{\bullet}\{l\_{i}:A\_{i}\}\_{i\in I})=\oplus^{(\bullet+t)}\{l\_{i}:\uparrow^{t}A\_{i}\}\_{i\in I}\uparrow^{t}(\uparrow^{\bullet}A)=\mathbf{1}^{(\bullet+t)}(\uparrow^{t}A)\end{array}$$

*We assume* <sup>ω</sup> <sup>+</sup> <sup>t</sup> <sup>=</sup> <sup>ω</sup> *for all* <sup>t</sup> <sup>∈</sup> <sup>N</sup>*.*

*The operator* <sup>↑</sup><sup>t</sup> *is extended component-wise to typing contexts:* <sup>↑</sup><sup>t</sup> <sup>Γ</sup>*.*

$$\begin{array}{ccccc} \begin{array}{ccccc} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} P \vdash F \ \cdot \ \end{array} \ \begin{array}{c} P \vdash F \ \cdot \end{array} \ \begin{array}{c} P \vdash F, x:A, y:A^{\bot} \end{array} \ \text{CYCE} \\\\ \begin{array}{c} \begin{array}{c} \begin{array}{c} P \vdash F \ \cdot \end{array} \ \begin{array}{c} P \vdash F \end{array} \ \begin{array}{c} P \vdash F, x:A, y:A^{\bot} \end{array} \ \text{CYCE} \end{array} \end{array} \end{array} \end{array} \end{array} \end{\begin{array}{c} \begin{array}{c} P \vdash F, x:A, y:A^{\bot} \end{array}} \begin{array}{c} \begin{array}{c} P \vdash F, x:A, y:A^{\bot} \end{array} \ \text{CYCE} \end{array}}{\begin{array}{c} \begin{array}{c} P \vdash F \ \cdot \end{array} \ \begin{array}{c} \begin{array}{c} \begin{array}{c} P \vdash F \end{array} \ \begin{array}{c} \begin{array}{c} P \vdash F \end{array} \ \begin{array}{c} P \vdash F \end{array} \ \begin{array}{c} P \vdash F \end{array} \ \begin{array}{c} P \vdash F \end{array} \end{array} \end{array} \end{pmatrix} \end{array}$$
 
$$\begin{array}{c} \begin{array}{c} P \vdash F, y:A, x:B \quad \mathsf{o} \ \mathsf{or} \ \mathsf{f}(\varGamma) \end{array} \end{array} \qquad \begin{array}{c} P \vdash F, y:A, x:B \quad \mathsf{o} \ \mathsf{or} \ \begin{array}{c} P \vdash F, y:A, x:B \quad \mathsf{o} \ \mathsf{or} \ \Gamma \end{array} \end{array}$$

$$\$$

### Fig. 2. Typing rules for PCP.

The typing rules are given in Fig. 2. Ax states that the forwarding process <sup>x</sup>→y<sup>A</sup> is well typed if <sup>x</sup> and <sup>y</sup> have dual types, respectively <sup>A</sup><sup>⊥</sup> and <sup>A</sup>. Mix types the parallel composition of two processes P and Q in the union of their disjoint typing contexts. Cycle is our key typing rule; it states that the restriction process is well typed, if the endpoints x and y have dual types, respectively A and A⊥. By Definition 2, A and A<sup>⊥</sup> also have the same priorities, enforcing law (ii) above. In classical logic this rule would be unsound, but in PLL it allows deadlock-free cycles. Rule ∅ states that inaction is well typed in the empty context. Rules **1** and ⊥ type channel closure actions from the viewpoint of each participant. Rule - (respectively ⊗) types an input process x(y).P (respectively, output process x[y].P), with y bound and x of type A <sup>o</sup> <sup>B</sup> (respectively, <sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup>). The priority o is strictly smaller than any priorities in the continuation process P, enforcing law (i) above. This is captured by o < pr(Γ) in the premises of both rules, abbreviating "for all <sup>z</sup> <sup>∈</sup> dom(Γ), <sup>o</sup> <sup>&</sup>lt; pr(Γ(z))". Rules & and <sup>⊕</sup> type external and internal choice, respectively, and follow the previous two rules. Rule ! types a server and states that if P communicates along y following protocol A, then !x(y).P communicates along x following protocol ! <sup>o</sup> A. The three remaining rules type different numbers of clients. Rule ? is for a single client: if P communicates along <sup>y</sup> following <sup>A</sup>, then ?x[y].P communicates along <sup>x</sup> following ?<sup>o</sup> <sup>A</sup>. Rule <sup>W</sup> is for no client: if P does not communicate along any channel following A, then it may be regarded as communicating along x following ?<sup>o</sup> A, for some priority <sup>o</sup>. Rule <sup>C</sup> is for multiple clients: if <sup>P</sup> communicates along <sup>y</sup> following ?<sup>κ</sup> <sup>A</sup>, and <sup>z</sup> following protocol ?<sup>κ</sup>- <sup>A</sup>, then <sup>P</sup>{x/y, <sup>x</sup>/z} communicates along a single channel x following ?<sup>o</sup> A, where o κ and o κ . The last two conditions are necessary to deal with some cases in the proof of Cycle-elimination (Theorem 1).

Lifting preserves typability, by an easy induction on typing derivations.

Lemma 1. *If* <sup>P</sup> <sup>Γ</sup> *then* <sup>P</sup> <sup>↑</sup><sup>t</sup> <sup>Γ</sup>*.*

We will use this result in the form of an admissible rule: P Γ <sup>P</sup> <sup>↑</sup><sup>t</sup> <sup>Γ</sup> <sup>↑</sup><sup>t</sup>

The Design of PCP. We have included Mix and Cycle, which allow derivation of both the standard Cut and the Multicut by Abramsky *et al.* [2].

$$\begin{array}{c} \vdash \Gamma, A\_1, \dots, A\_n \quad \vdash \Delta, A\_1^\perp, \dots, A\_n^\perp\\ \hline \vdash \Gamma, \Delta, A\_1, \dots, A\_n, A\_1^\perp, \dots, A\_n^\perp\\ \hline \vdash \Gamma, \Delta \end{array} \text{MX} \qquad \begin{array}{c} \text{Mix} \\ \text{CYCLE}^n \end{array} \qquad \begin{array}{c} \text{Mux} \\ \text{Mux} \\ \text{Tvxt} \end{array}$$

Conversely, Mix is the nullary case of Multicut, and Cycle can be derived from Ax and Multicut:

$$\frac{\vdash F, A, A^{\perp} \quad \overline{\vdash A^{\perp}, A}}{\vdash F} \begin{array}{c} \text{Ax} \\ \text{MULICUT} \end{array} \qquad \begin{cases} \text{Cycle} \\ \text{Cycle} \end{cases}$$

Having included Mix, we choose Cycle instead of Multicut, as Cycle is more primitive.

In the presence of Mix and Cycle, there is an isomorphism between <sup>A</sup> <sup>⊗</sup> <sup>B</sup> and A - <sup>B</sup> in CLL. Both <sup>A</sup> <sup>⊗</sup> <sup>B</sup> <sup>A</sup> - B and A - B A ⊗ B, are derivable, where C D - C<sup>⊥</sup> - <sup>D</sup> in CLL. Equivalently, both (A<sup>⊥</sup> - B⊥) - (A - B) and (A<sup>⊥</sup> <sup>⊗</sup> <sup>B</sup>⊥) - (<sup>A</sup> <sup>⊗</sup> <sup>B</sup>) are derivable. For simplicity, let pr(A) = pr(B) = <sup>ω</sup>; by duality also pr(A⊥) = pr(B⊥) = ω.


The above derivations *without* priorities show the isomorphism between A⊗B and A - <sup>B</sup> in CLL, which does not hold in our PLL, in particular as <sup>o</sup><sup>1</sup> <sup>=</sup> <sup>o</sup>2. The distinction between <sup>⊗</sup> and -, preserves the distinction between output and input in the term assignment. However, to simplify derivations, both typing rules (Fig. 2) have the same form. The usual tensor rule, where there are two separate derivations in the premise rather than just one, is derivable by using Mix.

Our type system performs priority-checking. Priorities can be inferred, as in Kobayashi's type system [32] and the tool TyPiCal [28]. We have opted for priority checking over priority inference, as the presentation is more elegant.

The following two examples illustrate the use of priorities. We first establish the structure of the typing derivation, then calculate the priorities. We conclude the section by showing the typing for the cyclic scheduler from Sect. 1.

*Example 2 (Cyclic process: deadlock-free).* Consider the following process

$$P \triangleq (\nu x\_1 y\_1)(\nu x\_2 y\_2) \left[ x\_1(v).x\_2(w).R \mid y\_1[\mathbf{n}].y\_2[\mathbf{n}'].Q \right]$$

where R x1().v().x2().w().**0** and Q y1[ ].**0** *|* **n**[ ].**0** *|* y2[ ].**0** *|* **n** [ ].**0**. First, we show the typing derivation for the left-hand side of the parallel, x1(v).x2(w).R:

$$\begin{array}{c} \hline \begin{array}{c} \hline \hline \mathbf{0} \vdash \varnothing \qquad \kappa\_{4} < \kappa\_{3} < \kappa\_{2} < \kappa\_{1} \\\hline \hline R \vdash x\_{1}: \perp^{\kappa\_{4}} v: \perp^{\kappa\_{3}}, x\_{2}: \perp^{\kappa\_{2}}, w: \perp^{\kappa\_{1}} \quad \mathbf{o}\_{1} < \kappa\_{4} \\\hline \hline x\_{2}(w), R \vdash x\_{1}: \perp^{\kappa\_{4}}, v: \perp^{\kappa\_{3}}, x\_{2}: \perp^{\kappa\_{1}} \ \mathbf{S}^{\mathbf{o}\_{1}} \perp^{\kappa\_{2}} \\\hline x\_{1}(v).x\_{2}(w).R \vdash x\_{2}: \perp^{\kappa\_{1}} \ \mathbf{S}^{\mathbf{o}\_{1}} \perp^{\kappa\_{2}}, x\_{1}: \perp^{\kappa\_{3}} \ \mathbf{S}^{\mathbf{o}\_{2}} \perp^{\kappa\_{4}} \\\hline \end{array} \quad \begin{array}{c} \perp^{4} \\\hline \hline \mathbf{o}\_{2} < \mathbf{o}\_{1} \\\hline \hline \end{array}$$

Now, the typing derivation for the right-hand side of the parallel, y1[**n**].y2[**n** ].Q, and recall that κ<sup>4</sup> < κ<sup>3</sup> < κ<sup>2</sup> < κ1:

<sup>y</sup>1[ ].**<sup>0</sup>** <sup>y</sup><sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>4</sup> <sup>1</sup> **<sup>n</sup>**[ ].**<sup>0</sup> <sup>n</sup>**: **<sup>1</sup>**<sup>κ</sup><sup>3</sup> <sup>1</sup> <sup>y</sup>2[ ].**<sup>0</sup>** <sup>y</sup><sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>2</sup> <sup>1</sup> **<sup>n</sup>** [ ].**<sup>0</sup> n** : **<sup>1</sup>**<sup>κ</sup><sup>1</sup> <sup>1</sup> <sup>y</sup>1[ ].**<sup>0</sup>** *<sup>|</sup>* **<sup>n</sup>**[ ].**<sup>0</sup>** *<sup>|</sup>* <sup>y</sup>2[ ].**<sup>0</sup>** *<sup>|</sup>* **<sup>n</sup>** [ ].**<sup>0</sup>** y<sup>1</sup> : **1**<sup>κ</sup><sup>4</sup> , **n**: **1**<sup>κ</sup><sup>3</sup> , y<sup>2</sup> : **1**<sup>κ</sup><sup>2</sup> , **n** : **1**<sup>κ</sup><sup>1</sup> o<sup>3</sup> < κ<sup>4</sup> Mix<sup>3</sup> y2[**n** ].Q <sup>y</sup><sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>4</sup> , **<sup>n</sup>**: **<sup>1</sup>**<sup>κ</sup><sup>3</sup> , y<sup>2</sup> : **<sup>1</sup>**<sup>κ</sup><sup>1</sup> <sup>⊗</sup><sup>o</sup><sup>3</sup> **<sup>1</sup>**<sup>κ</sup><sup>2</sup> <sup>o</sup><sup>4</sup> <sup>&</sup>lt; <sup>o</sup><sup>3</sup> ⊗ y1[**n**].y2[**n** ].Q <sup>y</sup><sup>2</sup> : **<sup>1</sup>**<sup>κ</sup><sup>1</sup> <sup>⊗</sup><sup>o</sup><sup>3</sup> **<sup>1</sup>**<sup>κ</sup><sup>2</sup> , y<sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>3</sup> <sup>⊗</sup><sup>o</sup><sup>4</sup> **<sup>1</sup>**<sup>κ</sup><sup>4</sup> <sup>⊗</sup> (2)

Finally, the typing derivation for process P is as follows:

(1) (2) <sup>x</sup>1(v).x2(w).R *<sup>|</sup>* <sup>y</sup>1[**n**].y2[**n** ].Q - <sup>x</sup><sup>2</sup> : <sup>⊥</sup><sup>κ</sup><sup>1</sup> <sup>o</sup><sup>1</sup> <sup>⊥</sup><sup>κ</sup><sup>2</sup> , x<sup>1</sup> : <sup>⊥</sup><sup>κ</sup><sup>3</sup> <sup>o</sup><sup>2</sup> <sup>⊥</sup><sup>κ</sup><sup>4</sup> , y<sup>2</sup> : **<sup>1</sup>**<sup>κ</sup><sup>1</sup> <sup>⊗</sup><sup>o</sup><sup>3</sup> **<sup>1</sup>**<sup>κ</sup><sup>2</sup> , y<sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>3</sup> <sup>⊗</sup><sup>o</sup><sup>4</sup> **<sup>1</sup>**<sup>κ</sup><sup>4</sup> o<sup>1</sup> = o<sup>3</sup> Mix (*ν*x2y2) - <sup>x</sup>1(v).x2(w).R *<sup>|</sup>* <sup>y</sup>1[**n**].y2[**n** ].Q - <sup>x</sup><sup>1</sup> : <sup>⊥</sup><sup>κ</sup><sup>3</sup> <sup>o</sup><sup>2</sup> <sup>⊥</sup><sup>κ</sup><sup>4</sup> , y<sup>1</sup> : **<sup>1</sup>**<sup>κ</sup><sup>3</sup> <sup>⊗</sup><sup>o</sup><sup>4</sup> **<sup>1</sup>**<sup>κ</sup><sup>4</sup> <sup>o</sup><sup>2</sup> <sup>=</sup> <sup>o</sup><sup>4</sup> Cycle (*ν*x1y1)(*ν*x2y2) - <sup>x</sup>1(v).x2(w).R *<sup>|</sup>* <sup>y</sup>1[**n**].y2[**n** ].Q -∅ Cycle

The system of equations

o<sup>2</sup> < o<sup>1</sup> o<sup>4</sup> < o<sup>3</sup> o<sup>1</sup> = o<sup>3</sup> o<sup>2</sup> = o<sup>4</sup>

can be solved by the assignment o<sup>1</sup> = o<sup>3</sup> = 1 and o<sup>2</sup> = o<sup>4</sup> = 0.

*Example 3 (Cyclic process: deadlocked!).* Now consider the process

$$P' = (\nu x\_1 y\_1)(\nu x\_2 y\_2) \left[ x\_1(v).x\_2(w).R \mid y\_2[\mathbf{n}'].y\_1[\mathbf{n}].Q \right]$$

where R = x1().v().x2().w().**0** and Q = y1[ ].**0** *|* **n**[ ].**0** *|* y2[ ].**0** *|* **n** [ ].**0**. Notice that the order of actions on channels y<sup>1</sup> and y<sup>2</sup> is now swapped, thus causing a deadlock! If we tried to construct a typing derivation for process P , we would have for the right-hand side of the parallel the following:


Then, the system of equations

$$\mathbf{o}\_2 < \mathbf{o}\_1 \qquad \qquad \mathbf{o}\_3 < \mathbf{o}\_4 \qquad \qquad \mathbf{o}\_1 = \mathbf{o}\_3 \qquad \qquad \mathbf{o}\_2 = \mathbf{o}\_4$$

has no solution because it requires o<sup>2</sup> < o<sup>3</sup> and o<sup>3</sup> < o2, which is impossible.

*Example 1 continued (Cyclic Scheduler)*

$$\begin{array}{lcl} Sech & \triangleq ... (\nu a\_i b\_i)...(\nu c\_i d\_{(i+1)\bmod n}) \{ A\_0 \mid A\_1 \mid \ldots \mid A\_{n-1} \mid P\_0 \mid P\_1 \mid \ldots \mid P\_{n-1} \} \\\ A\_0 & \triangleq c\_0 [\mathbf{n}\_0].d\_0(x\_0).a\_0[\mathbf{m}\_0].\mathsf{close}\_0 \\\ A\_i & \triangleq d\_i(x\_i).a\_i[\mathbf{m}\_i].c\_i[\mathbf{n}\_i].\mathsf{close}\_i \\\ P\_i & \triangleq b\_i(y\_i).Q\_i & i \in \{0, \ldots, n-1\} \end{array}$$

By applying the typing rules in Fig. 2 we can derive Sched ∅, since it is a closed process, and assign the following types and priorities:

$$\begin{array}{llll} c\_{0}: \mathbf{1} \otimes^{0} \mathbf{1} & d\_{0}: \mathsf{L} \otimes^{2(n-1)} \perp & a\_{0}: \mathbf{1} \otimes^{2(n-1)+1} \mathbf{1} & \text{for } A\_{0} \\ d\_{i}: \mathsf{L} \otimes^{2i-2} \perp & a\_{i}: \mathbf{1} \otimes^{2i-1} \mathbf{1} & c\_{i}: \mathbf{1} \otimes^{2i} \mathbf{1} & \text{for } A\_{i}, 0 < i < n \\ b\_{0}: \mathsf{L} \otimes^{2(n-1)+1} \perp & b\_{i}: \mathsf{L} \otimes^{2i-1} \mathsf{L} & \text{for } P\_{0} \text{ and } P\_{i}, 0 < i < n \\ \end{array}$$

The priorities of types ⊥ and **1** could be easily assigned as Example 2. As the priority of <sup>d</sup><sup>i</sup>+1 is 2(<sup>i</sup> + 1) <sup>−</sup> 2=2i, we can connect it to <sup>a</sup><sup>i</sup> with a Cycle.

### 3 Operational Semantics of PCP

In this section we define structural equivalence, the principal β-reduction rules and commuting conversions. The detailed derivations can be found in [18].

We define structural equivalence to be the smallest congruence relation satisfying the following axioms. SC-Ax-Swp allows swapping channels in the forwarding process. SC-Ax-Cycle states that cycle applied to a forwarding process is equivalent to inaction. This allows elimination of unnecessary cycles. Axioms SC-Mix-Nil, SC-Mix-Comm and SC-Mix-Asc state that parallel composition uses the inaction as the neutral element and is commutative and associative. SC-Cycle-Ext is the standard scope extrusion rule. SC-Cycle-Swp allows swapping channels and SC-Cycle-Comm states the commutativity of restriction<sup>1</sup>.

<sup>1</sup> Note that associativity of restriction is derived from SC-Mix-Comm and SC-Cycle-Comm.


The core of the operational semantics consists of β-reductions. In π-calculus terms these are communication steps; in logical terms they are Cycle-elimination steps. β⊗ is given in Fig. 3 to illustrate priorities. It simplifies a cycle connecting <sup>x</sup> of type <sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup> and <sup>y</sup> of type <sup>A</sup> <sup>o</sup> B, which corresponds to communication between an output on x and an input on y, respectively. Both actions have priority o, which is strictly smaller than any priorities in their typing contexts, respecting the fact that they are top-level prefixes. The remaining β-reductions are summarised below. <sup>β</sup>AxCycle simplifies a Cycle involving an axiom. <sup>β</sup>**<sup>1</sup>**<sup>⊥</sup> closes and eliminates channels. β⊕&, similarly to β⊗-, simplifies a communication between a selection and a branching. β!? simplifies a cycle between one server of type ! <sup>o</sup> A and one client of type ?<sup>o</sup> A. The last two rules differ in the number of clients involved: rule β!<sup>W</sup> considers no clients, whether β!<sup>C</sup> considers multiple clients.

$$\begin{array}{c} \beta\_{\text{AXCYCLE}} \quad (\nu y^A z)(x \to y^A \mid P) \vdash \Gamma, x \colon A^\perp \longrightarrow \text{P} \{x/z\} \vdash \Gamma, x \colon A^\perp\\ \beta\_{\text{1\perp}} \quad (\nu x^A y)(x \mid \mid \mathbf{0} \mid y \mid \mathbf{0}). P) \vdash \Gamma \longrightarrow \text{P} \vdash \Gamma\\ \beta\_{\text{\oplus k}} \quad (\nu x^{\oplus \ \{l\_i:B\_i\}\_{i \in I}} y)(x \lhd\_i.p \mid y \rhd \{l\_i:Q\_i\}\_{i \in I}) \vdash \Gamma, \Delta \longrightarrow\\ \beta\_{\text{??}} \quad (\nu x^{\sf P} \, y)(\{x\} \, \mathbf{0} \mid y \rhd \mathbf{1} \, \mathbf{\varbeta} \, \mathbf{J}) \vdash \Gamma, \Delta \longrightarrow\\ \beta\_{\text{??}} \quad (\nu x^{\sf P} \, \mathbf{a})(\{x\} \boldsymbol{v}). P \mid \mathbf{\upbeta} \, \mathbf{\upbeta} \vdash \Gamma, \Delta \longrightarrow \, Q \vdash \text{??}, \Delta\\ \beta\_{\text{\sf W}} \quad (\nu x^{\sf P} \, \mathbf{a})(\{x\} \boldsymbol{v}). P \mid \mathbf{Q} \, \{\mathbf{\upbeta}\, \mathbf{\upbeta} \, \mathbf{y}\}) \vdash \text{??}, \Delta\\ \beta\_{\text{\sf C}} \quad (\nu x^{\sf P} \, \mathbf{a}^{\sf P} \, \mathbf{y})(\{x\} \boldsymbol{v}). P \mid \mathbf{Q} \, \{\mathbf{\upbeta}\, \mathbf{y}\} \mathbf{y} \mathbf{f}) \vdash \text{??}, \Delta\\ \beta\_{\text{\sf C}} \quad (\nu x^{\sf P} \, \mathbf{a}^{\sf P} \, \mathbf{y})(\$$

$$\begin{array}{c} \mathsf{o} < \mathsf{pr}(I) \\ P \vdash I, v: A, x: B \\ \hline \overline{x[v].P \vdash I, x: A \otimes^{o} B} \quad \otimes \\ \overline{x[v].P \vdash I, x: A \otimes^{o} B} \quad \otimes \\ \overline{x[v].P \mid y(w).Q \vdash I, \Delta, x: A \otimes^{o} B, y: A^{\perp} \otimes^{o} B^{\perp}} \quad \textsc{Mix} \\ \hline \overline{(\mathsf{r}x^{A} \otimes^{o} B)(x[v].P \mid y(w).Q) \vdash I, \Delta} \\ \end{array}$$

$$\begin{array}{c} \overline{P \vdash I, v: A, x: B \qquad Q \vdash \Delta, w: A^{\perp}, y: B^{\perp}} \quad \mrightarrow{\text{Mix}} \\ \hline \overline{P \mid Q \vdash I, \Delta, v: A, x: B, w: A^{\perp}, y: B^{\perp}} \quad \textsc{Mix} \\ \overline{(\mathsf{r}x^{A} \otimes^{o} C)(x \otimes^{B} B)(x \otimes^{o} B)} \quad \textsc{Cv \& B}^{\perp} \end{array}$$

Fig. 3. <sup>β</sup>-reduction for <sup>⊗</sup> and -.

Commuting conversions, following [12,41], allow communication prefixes to be moved to the conclusion of a typing derivation, corresponding to pulling them out of the scope of Cycle rules. In order to account for the sequence of Cycles, here we use ˜·. Due to this movement, if a prefix on a channel endpoint x with priority o is pulled out at top level, then to preserve priority conditions in the typing rules in Fig. 2, it is necessary to increase priorities of all actions after the prefix on <sup>x</sup>. This increase is achieved by using <sup>↑</sup>o+1(·) in the typing contexts. κ⊥ (*ν*xA-

 y) - x().P *|* Q Γ, Δ, x: <sup>⊥</sup><sup>o</sup> −→ x().[(*ν*xA- y) - P *|* Q ] <sup>↑</sup><sup>o</sup>+1 Γ, <sup>↑</sup><sup>o</sup>+1 Δ, x: <sup>⊥</sup><sup>o</sup> κ⊗ (*ν*xA- y) - x[v].P *|* Q Γ, Δ, x:<sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup> −→ x[v]. (*ν*xA- y) - P *|* Q (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x: (↑<sup>o</sup>+1 <sup>A</sup>) <sup>⊗</sup><sup>o</sup> (↑<sup>o</sup>+1 <sup>B</sup>) κ- (*ν*xA- y) - x(w).P *|* Q Γ, Δ, x:<sup>A</sup> <sup>o</sup> <sup>B</sup> −→ x(w). (*ν*xA- y) - P *|* Q (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x: (↑<sup>o</sup>+1 <sup>A</sup>) <sup>o</sup> (↑<sup>o</sup>+1 <sup>B</sup>) κ⊕ (*ν*xA- <sup>y</sup>)(xl<sup>j</sup> .P *<sup>|</sup>* <sup>Q</sup>) Γ, Δ, x:⊕<sup>o</sup>{l<sup>i</sup> : <sup>B</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> −→ xl<sup>j</sup> . (*ν*xA- y) - P *|* Q (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x:⊕<sup>o</sup>{l<sup>i</sup> : <sup>↑</sup><sup>o</sup>+1 <sup>B</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> κ& (*ν*xA- <sup>y</sup>)(x {l<sup>i</sup> : <sup>P</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> *<sup>|</sup>* <sup>Q</sup>) Γ, Δ, x:&<sup>o</sup>{l<sup>i</sup> : <sup>B</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> −→ x {l<sup>i</sup> : (*ν*x<sup>A</sup>- y) - P<sup>i</sup> *|* Q }<sup>i</sup>∈<sup>I</sup> (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x:&<sup>o</sup>{l<sup>i</sup> : <sup>↑</sup><sup>o</sup>+1 <sup>B</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> κ? (*ν*xA- y) - ?x[w].P *|* Q Γ, Δ, x: ?<sup>o</sup> <sup>A</sup> −→ ?x[w]. (*ν*xA- y) - P *|* Q (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x: ?<sup>o</sup> (↑<sup>o</sup>+1 <sup>A</sup>) <sup>κ</sup>! (*ν*x? o Ay) - !x(v).P *|* Q ?Γ, Δ, x: !<sup>o</sup> <sup>A</sup> −→ !x(v). (*ν*x? o Ay) - P *|* Q (↑<sup>o</sup>+1 <sup>Γ</sup>), (↑<sup>o</sup>+1 <sup>Δ</sup>), x: !<sup>o</sup> (↑<sup>o</sup>+1 <sup>A</sup>)

Finally, we give the following additional reduction rules: closure under structural equivalence, and two congruence rules, for restriction and for parallel.

$$\begin{array}{llll} \text{CLoose-Equiv} & P \equiv Q & Q \longrightarrow R & R \equiv S \text{ implies } P \longrightarrow S\\ \text{Cong-Cyculae} & P \longrightarrow Q & \text{implies } (\nu x^A y)P \longrightarrow (\nu x^A y)Q\\ \text{Cong-Max} & P \longrightarrow Q & \text{implies } P \mid R \longrightarrow Q \mid R \end{array}$$

### 4 Results for PLL and PCP

#### 4.1 Cycle-Elimination for PLL

We start with results for Cycle-elimination for PLL; thus here we refer to A, B as propositions, rather than types. The detailed proofs can be found in [18].

Definition 6. *The* degree *function* ∂(·) *on propositions is defined by:*

*–* <sup>∂</sup>(**1**<sup>o</sup>) = <sup>∂</sup>(⊥<sup>o</sup>)=1 *–* <sup>∂</sup>(<sup>A</sup> <sup>⊗</sup><sup>o</sup> <sup>B</sup>) = <sup>∂</sup>(<sup>A</sup> <sup>o</sup> B) = ∂(A) + ∂(B)+1 *–* <sup>∂</sup>(&<sup>o</sup>{l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> ) = <sup>∂</sup>(⊕<sup>o</sup>{l<sup>i</sup> : <sup>A</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> ) = <sup>i</sup>∈<sup>I</sup> {∂(Ai)} + 1 *–* ∂( ?<sup>o</sup> A) = ∂( !<sup>o</sup> A) = ∂(A)+1*.*

Definition 7. *<sup>A</sup>* Maxicut *is a maximal sequence of* Mix *and* Cycle *rules, ending with a* Cycle *rule.*

Maximality means that the rules applied immediately before a Maxicut are any rules in Fig. 2, other than Mix or Cycle. The order in which Mix and Cycle rules are applied within a Maxicut is irrelevant. However, Proposition 1, which follows directly from structural equivalence (Sect. 3), allows us to simplify a Maxicut.

Proposition 1 (Canonical Maxicut). *Given an arbitrary* Maxicut*, it is always possible to obtain from it a* canonical Maxicut *consisting of a sequence of only* Mix *rules followed by a sequence of only* Cycle *rules.*

Definition 8. *<sup>A</sup>* single-Mix Maxicut *contains only one* Mix *rule.*

<sup>A</sup>1,...,An, A *are* Maxicut propositions *if they are eliminated by a* Maxicut*. The* degree of a sequence of Cycle*s is the sum of the degrees of the eliminated propositions.*

*The* degree of a Maxicut *is the sum of the degrees of the* Cycle*s in it.*

*The* degree of a proof <sup>π</sup>*,* <sup>d</sup>(π)*, is the* sup *of the degrees of its* Maxicut*s, implying* <sup>d</sup>(π)=0 *if and only if proof* <sup>π</sup> *has no* Cycle*s.*

*The* height of a proof π*,* h(π)*, is the height of its tree, and it is defined as* h(π) = sup- h(πi) <sup>i</sup>∈<sup>I</sup> + 1*, where* {π<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> *are the subproofs of* <sup>π</sup>*.*

Maxicut has some similarities with the derived Multicut: it generalises Multicut in the number of Mixes, and a single-Mix Maxicut is an occurrence of Multicut.

The core of Cycle-elimination for our PLL, as for Cut-elimination for CLL [10,25], is the Principal Lemma (Lemma 3), which eliminates a Cycle by either (i) replacing it with another Cycle on simpler propositions, or (ii) pushing it further up the proof tree. Item (i) corresponds to (the logical part of) β-reductions (Sect. 3); and (ii) corresponds to (the logical part of) commuting conversions (Sect. 3).

Exceptionally, β!<sup>C</sup> reduces the original proof in a way that neither (i) nor (ii) are respected. In order to cope with this case, we introduce Lemma 2, which is inspired by Lemma B.1.3 in Bräuner [10], and adapted to our PLL. Lemma <sup>2</sup> allows us to reduce the degree of a proof ending with a single-Mix Maxicut and having the same degree as the whole proof, and where the last rule applied on the left hand-side immediate subproof is !. Let [n] denote the set {1,...,n}.

Lemma 2 (Inspired by B.1.3 in Bräuner [10]). *Let* τ *be a proof of the following form, ending with a single-*Mix Maxicut*:*

π*. . . .* o < pr(?Γ) <sup>∀</sup><sup>i</sup> <sup>∈</sup> [n] : <sup>o</sup> <sup>&</sup>lt; <sup>o</sup><sup>i</sup> ?Γ, ?<sup>o</sup><sup>1</sup> <sup>A</sup>1, ..., ?<sup>o</sup>*<sup>n</sup>* <sup>A</sup>n, A ?Γ, ?<sup>o</sup><sup>1</sup> <sup>A</sup>1, ..., ?<sup>o</sup>*<sup>n</sup>* <sup>A</sup>n, ! <sup>o</sup> <sup>A</sup> ! π *. . . .* o < pr(Δ) <sup>∀</sup><sup>i</sup> <sup>∈</sup> [n] : <sup>o</sup> <sup>&</sup>lt; <sup>o</sup><sup>i</sup> <sup>∀</sup><sup>j</sup> <sup>∈</sup> [k] : <sup>o</sup> <sup>κ</sup><sup>j</sup> Δ, ! <sup>o</sup><sup>1</sup> A<sup>⊥</sup> <sup>1</sup> , ..., ! <sup>o</sup>*<sup>n</sup>* A<sup>⊥</sup> <sup>n</sup> ,( ?<sup>κ</sup>*<sup>j</sup>* <sup>A</sup>⊥)<sup>j</sup>∈[k] Δ, ! <sup>o</sup><sup>1</sup> A<sup>⊥</sup> <sup>1</sup> , ..., ! <sup>o</sup>*<sup>n</sup>* A<sup>⊥</sup> <sup>n</sup> , ?<sup>o</sup> <sup>A</sup><sup>⊥</sup> <sup>C</sup>k−<sup>1</sup> ?Γ, Δ, ?<sup>o</sup><sup>1</sup> <sup>A</sup>1, ..., ?<sup>o</sup>*<sup>n</sup>* <sup>A</sup>n, ! <sup>o</sup> A, ! <sup>o</sup><sup>1</sup> A<sup>⊥</sup> <sup>1</sup> , ..., ! <sup>o</sup>*<sup>n</sup>* A<sup>⊥</sup> <sup>n</sup> , ?<sup>o</sup> A<sup>⊥</sup> Mix ?Γ,Δ Cycle

*where* d(π) < d(τ ) *and* d(π ) < d(τ )*. Then, there is a proof* τ *of* ?Γ,Δ *such that* d(τ ) < d(τ )*.*

*Proof.* Induction on h(π ), with a case-analysis on the last rule applied in π .

Lemma 3 (The Principal Lemma). *Let* τ *be a proof of* Γ*, ending with a canonical* Maxicut*:*

$$\frac{\frac{\pi\_1 \dots \pi\_m}{\vdash \Gamma, A\_1, \dots, A\_n, A, A\_1^\perp, \dots, A\_n^\perp, A^\perp} \text{ Max}}{\vdash \Gamma}\_{\text{CYCLE}}$$

*such that for all* <sup>i</sup> <sup>∈</sup> [m]*,* <sup>d</sup>(πi) < d(<sup>τ</sup> )*. Then there is a proof* <sup>τ</sup> *of* <sup>↑</sup><sup>t</sup> <sup>Γ</sup>*, for some* t 0*, such that* d(τ ) < d(τ )*.*

*Proof.* The proof is by induction on <sup>i</sup>∈[m] <sup>h</sup>(πi). Let <sup>r</sup><sup>i</sup> be the last rule applied in πi, for i ∈ [m] and let C<sup>r</sup>*<sup>i</sup>* be the proposition introduced by ri. Consider the proposition with the *smallest* priority. If the proposition is not unique, just pick ...

one. Let this proposition be C<sup>r</sup>*<sup>k</sup>* . Then, π<sup>k</sup> is the following proof: Γ , C<sup>r</sup>*<sup>k</sup>* rk We proceed by cases on πk.

<sup>−</sup> <sup>r</sup><sup>k</sup> is <sup>⊗</sup> on one of the Maxicut propositions <sup>A</sup>1,...,An, A. Without loss of generality, suppose <sup>r</sup><sup>k</sup> is applied on <sup>A</sup>, meaning <sup>A</sup> <sup>=</sup> <sup>E</sup> <sup>⊗</sup><sup>o</sup> <sup>F</sup> for some <sup>E</sup> and <sup>F</sup> and <sup>o</sup> <sup>0</sup>. By <sup>⊗</sup> rule in Fig. 2, <sup>o</sup> <sup>&</sup>lt; pr(Γ ). Since <sup>A</sup> is a Maxicut proposition, by Definition 2, A<sup>⊥</sup> = E<sup>⊥</sup> <sup>o</sup> F <sup>⊥</sup>. Since o < pr(Γ ) and pr(A⊥) = o, it must be that A<sup>⊥</sup> is in another proof, say πh: ... <sup>Γ</sup>, E<sup>⊥</sup> <sup>o</sup> F <sup>⊥</sup> rh

Consider the case where r<sup>h</sup> is a multiplicative, additive, exponential or ⊥ rule in Fig. 2. Suppose r<sup>h</sup> is applied on C<sup>r</sup>*<sup>h</sup>* which is not A⊥. All the mentioned rules require pr(C<sup>r</sup>*<sup>h</sup>* ) < pr(Γ, E<sup>⊥</sup> <sup>o</sup> <sup>F</sup> <sup>⊥</sup> \ <sup>C</sup><sup>r</sup>*<sup>h</sup>* ), implying pr(C<sup>r</sup>*<sup>h</sup>* ) <sup>&</sup>lt; pr(E<sup>⊥</sup> <sup>o</sup> F <sup>⊥</sup>) = pr(<sup>E</sup> <sup>⊗</sup><sup>o</sup> <sup>F</sup>) = <sup>o</sup>. This contradicts the fact that <sup>o</sup> is the smallest priority. Hence, r<sup>h</sup> must be a introducing A⊥.

We construct proof <sup>τ</sup><sup>A</sup> ending with a single-Mix Maxicut applied on *at least* A:

$$\begin{array}{c} \pi\_{\otimes} \\ \vdots \\ \vdash I', E, F \quad \mathsf{o} < \mathsf{pr}(I') \\ \hline \vdash I', E \otimes^{\mathsf{o}} F \\ \hline \vdash I', E \otimes^{\mathsf{o}} F \\ \hline \vdash I', I'', E \otimes^{\mathsf{o}} F, E^{\perp} \otimes^{\mathsf{o}} F^{\perp} \\ \hline \hline \vdash I', I'', E \otimes^{\mathsf{o}} F, E^{\perp} \otimes^{\mathsf{o}} F^{\perp} \\ \hline \vdash I'''' \end{array} \text{Max} \otimes \mathsf{r}$$

Then, by structural equivalence, we can rewrite τ in terms of τA. By applying β⊗ on τ<sup>A</sup> (only considering the logical part), we obtain a proof τ <sup>A</sup> such that d(τ <sup>A</sup>) < d(τA) <sup>≤</sup> <sup>d</sup>(<sup>τ</sup> ), because <sup>∂</sup>(E)+∂(F) < ∂(<sup>E</sup> <sup>⊗</sup><sup>o</sup> <sup>F</sup>). We can then construct τ by substituting τ <sup>A</sup> for τ<sup>A</sup> in τ , which concludes this case.

<sup>−</sup> <sup>r</sup><sup>k</sup> is ! on one of the Maxicut propositions <sup>A</sup>1,...,An, A. Without loss of generality, suppose r<sup>k</sup> introduces A, implying that A = !<sup>o</sup> A for some A and o 0. Then π<sup>k</sup> is the following proof:

$$\begin{array}{c} \pi\_! \\ \vdots \\ \vdash ?\Theta, A' \quad \mathsf{o} < \mathsf{pr}(?\Theta) \\ \hline \vdash ?\Theta, \ !^\bullet A' \end{array} \mid \mathsf{:}$$

where <sup>Γ</sup> = ?Θ. Since <sup>A</sup> is a Maxicut proposition, by duality <sup>A</sup><sup>⊥</sup> = ?<sup>o</sup> <sup>A</sup>⊥. Since o < pr(Γ ) and pr(A⊥) = o, it must be that A<sup>⊥</sup> is in another proof. Let it be π<sup>h</sup> for h ∈ [m] and h = k. Then we apply Lemma 2 to π<sup>k</sup> and πh, obtaining a proof which we use to construct τ , as we did in the previous case.

Lemma 4. *Given a proof* τ *of* Γ*, such that* d(τ ) > 0*, then for some* t 0 *there is a proof* <sup>τ</sup> *of* <sup>↑</sup><sup>t</sup> <sup>Γ</sup> *such that* <sup>d</sup>(<sup>τ</sup> ) < d(τ )*.*

*Proof.* By induction on h(τ ). We have the following cases.

<sup>−</sup> If <sup>τ</sup> ends in a Maxicut whose degree is *the same as* the degree of <sup>τ</sup> :

$$\frac{\frac{\pi\_1 \dots \pi\_m}{\vdash \Gamma, A\_1, \dots, A\_n, A, A\_1^\perp, \dots, A\_n^\perp, A^\perp} \text{ Mix}^m}{\vdash \Gamma}\_{\text{YCLE}}^m$$

we can apply the induction hypothesis to the subproofs of τ right before the last Mix preceding the sequence of Cycle. This allows us to reduce their degrees to become smaller than d(τ ). Then we use Lemma 3.

− Otherwise, by using the inductive hypothesis on the immediate subproofs to reduce their degree, we also reduce the degree of the whole proof.

Theorem 1 (Cycle-Elimination). *Given any proof of* Γ*, we can construct a* Cycle*-free proof of* <sup>↑</sup><sup>t</sup> <sup>Γ</sup>*, for some* <sup>t</sup> <sup>0</sup>*.*

*Proof.* Iteration on Lemma 4.

Cycle-elimination increases the priorities of the propositions in <sup>Γ</sup>. This is solely due to the (logical part of) our commuting conversions in Sect. 3.

#### 4.2 Deadlock-Freedom for PCP

Theorem 2 (Subject Reduction). *If* <sup>P</sup> <sup>Γ</sup> *and* <sup>P</sup> −→ <sup>Q</sup>*, then* <sup>Q</sup> <sup>↑</sup><sup>t</sup> <sup>Γ</sup>*, for some* t 0*.*

*Proof.* Follows from the β-reductions and commuting conversions in Sect. 3.

Definition 9. *A process is a* Cycle *if it is of the form* (*ν*x<sup>A</sup>y)P*.*

Theorem 3 (Top-Level Deadlock-Freedom). *If* <sup>P</sup> <sup>Γ</sup> *and* <sup>P</sup> *is a* Cycle*, then there is some* <sup>Q</sup> *such that* <sup>P</sup> −→<sup>∗</sup> <sup>Q</sup> *and* <sup>Q</sup> *is not a* Cycle*.*

*Proof.* The interpretation of Lemma <sup>3</sup> for PCP is that either (i) a top-level communication occurs, corresponding to a β-reduction, or (ii) commuting conversions are used to push Cycle further inwards in a process. Consequently, iterating Lemma <sup>3</sup> results in eliminating top-level Cycles.

Eliminating all Cycles, as specified by Theorem 1, would correspond to a semantics in which reduction occurs under prefixes, as discussed by Wadler [41]. In order to achieve this, we would need to introduce additional congruence rules, such as:

$$\frac{P \longrightarrow Q}{x(y).P \longrightarrow x(y).Q}$$

and similarly for other actions. Reductions of this kind are not present in the π-calculus, and we also omit them in our framework.

However, we can eliminate all Cycles in a proof of <sup>∅</sup>, corresponding to full deadlock-freedom for closed processes. Kobayashi's type system [32] satisfies the same property.

Theorem 4 (Deadlock-Freedom for Closed Processes). *If* P ∅*, then either* P ≡ **0** *or there is* Q *such that* P −→ Q*.*

*Proof.* This follows from Theorems 2 and 3, because if Q ∅ and Q is not a Cycle then <sup>Q</sup> must be a parallel composition of **<sup>0</sup>** processes.

### 5 Related Work and Conclusion

Cycle and Multicut rules were explored by Abramsky *et al.* [2–4] in the context of ∗-autonomous categories. That work is not directly comparable with ours, as it only presented a typed semantics for CCS-like processes and did not give a type system for a language or a term assignment for a logical system. Atkey *et al.* [5] added a Multicut rule to CP, producing an isomorphism between <sup>⊗</sup> and -, but they did not consider deadlock-freedom.

In Kobayashi's original type-theoretic approach to deadlock-freedom [29], priorities were abstract tags from a partially ordered set. In later work abstract tags were simplified to natural numbers, and priorities were replaced by pairs of obligations and capabilities [30,32]. The latter change allows more processes to be typed, at the expense of a more complex type system. Padovani [36] adapted Kobayashi's approach to session types, and later on he simplified it to a single priority for linear π-calculus [37]. Then, the single priority technique can be transferred to session types by the encoding of session types into linear types [16,17,19,33]. For simplicity, we have opted for single priorities, as Padovani [37].

The first work on progress for session types, by Dezani-Ciancaglini *et al.* [15,22], guaranteed the property by allowing only one active session at a time. Later work [21] introduced a partial order on channels in Kobayashi-style [29]. Bettini *et al.* [9] applied similar ideas to multiparty session types. The main difference with our work is that we associate priorities with individual communication operations, rather than with entire channels. Carbone *et al.* [13] proved that progress is a compositional form of lock-freedom and introduced a new technique for progress in session types by adopting Kobayashi's type system and the encoding of session types [19]. Vieira and Vasconcelos [40] used single priorities and an abstract partial order in session types to guarantee deadlock-freedom.

The linear logic approach to deadlock-free session types started with Caires and Pfenning [12], based on dual intuitionistic linear logic, and was later formulated for classical linear logic by Wadler [41]. All subsequent work on linear logic and session types enforces deadlock-freedom by forbidding cyclic connections. In their original work, Caires and Pfenning commented that it would be interesting to compare process typability in their system with other approaches including Kobayashi's and Dezani-Ciancaglini's. However, we are aware of only one comparative study of the expressivity of type systems for deadlock-freedom, by Dardha and Pérez [20]. They compared Kobayashi-style typing and CLL typing, and proved that CLL corresponds to Kobayashi's system with the restriction that only single cuts, not multicuts, are allowed.

In this paper, we have presented a new logic, priority-based linear logic (PLL), and a term assignment system, priority-based CP (PCP), that increase the expressivity of deadlock-free session type systems, by combining Caires and Pfenning's linear logic-based approach and Kobayashi's priority-based type system. The novel feature of PLL and PCP is Cycle, which allows cyclic process structures to be formed if they do not violate ordering conditions on the priorities of prefixes. Following the propositions-as-types paradigm, we prove a Cycle-elimination theorem analogous to the standard Cut-elimination theorem. As a result of this theorem, we obtain deadlock-freedom for a class of π-calculus processes which is larger than the class typed by Caires and Pfenning. In particular, these are processes that typically share more than one channel in parallel.

There are two main directions for future work. First, develop a type system for a functional language, priority-based GV, and translate it into PCP, along the lines of Lindley and Morris' [34] translation of GV [41] into CP. Second, extend PCP to allow recursion and sharing [6], in order to support more general concurrent programming, while maintaining deadlock-freedom, as well as termination, or typed behavioural equivalence.

Acknowledgements. We are grateful for suggestions and feedback from the anonymous reviewers and colleagues: Wen Kokke, Sam Lindley, Roly Perera, Frank Pfenning, Carsten Schürmann and Philip Wadler.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Double Category Theoretic Analysis of Graded Linear Exponential Comonads**

Shin-ya Katsumata(B)

National Institute of Informatics, Tokyo, Japan s-katsumata@nii.ac.jp

**Abstract.** Graded linear exponential comonads are an extension of linear exponential comonads wih *grading*, and provide a categorical semantics of resource-sensitive exponential modality in linear logic. In this paper, we propose a concise double-category theoretic formulation of graded linear exponential comonads as a kind of monoid homomorphisms from the multiplicative monoids of semirings to the composition monoids of symmetric monoidal endofunctors. We also exploit this formulation to derive the category of graded comonoid-coalgebras, which decompose graded linear exponential comonads into symmetric monoidal adjunctions plus twists.

### **1 Introduction**

One of the important discoveries in substructural logic is the decomposition of the intuitionistic implication <sup>φ</sup> <sup>⇒</sup> <sup>ψ</sup> using the linear implication and the *exponential modality* !. This discovery was studied by Girard through his *linear logic*, which brought many new ideas and perspectives to logic and programming language semantics.

Inside linear logic proofs, propositions with the exponential modality !φ can be freely copied or discarded. Later, it was realized that by adding a copy limit to the exponential modality, like !rφ, linear logic gains fine control of assumption usage. This idea was first implemented in *bounded linear logic* [9], and studied in connection with implicit complexity theory [4,14]. Indexed exponential modalities !<sup>r</sup> were then used in wider context: resource management in programming languages [3,7,8,20,23] and control of sensitivity in the metric semantics of programs [5,21].

The categorical structure corresponding to the exponential modality ! was studied by various researchers, and it was identified as a categorical structure called *linear exponential comonad* [1]. One of the celebrated results about linear exponential comonads is that any symmetric lax monoidal adjunction:

$$(\mathbb{D},1,\times)\xleftarrow{L}\xleftarrow{L}\xleftarrow{L}(\mathbb{C},\mathbf{I},\otimes)\quad\text{(the monoidal structure }1,\times\text{ is cartesian)}$$

yields a linear exponential comonad <sup>L</sup>◦R, and every linear exponential comonad D arises in this way - for D take the category of Eilenberg-Moore coalgebras of D.

c The Author(s) 2018 C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 110–127, 2018. https://doi.org/10.1007/978-3-319-89366-2\_6

The categorical structure corresponding to the indexed exponential modality !<sup>r</sup> has been proposed as *exponential action* [3] and *graded linear exponential comonad* [7]; they are two different presentations of the same data. Compared to linear exponential comonads, however, categorical understanding of graded linear exponential comonads is not well-established. The aim of this paper is to contribute to this point. Concretely speaking, we show the following categorical results about graded linear exponential comonads:


### **2 Related Work**

Graded linear exponential comonads were first introduced as *exponential actions* in [3], and an equivalent definition was given in [7]. This paper adopts the latter definition as the starting point of study. These papers also consider linear type systems with an indexed exponential modality !rφ, which is directly interpreted by a graded linear exponential comonad. This paper, however, focuses only on the categorical axiomatics of the indexed exponential modality, and omit its syntactic theory. In [2], Breuvart and Pagani gave a construction of graded linear exponential comonads from a set of data called *stratification*. They derived various graded linear exponential comonads on the category of sets and binary relations and the category of coherence spaces. Structures close to, but different from, graded linear exponential comonads were considered in the categorical semantics of the following calculi: *INTML* for interactive computation [23], *coeffect calculus* [20] and *bounded affine types system* [8].

Looking at the dual structure, *graded monads*, first considered in mathematics [6,25], were recently used in the semantic study of logic, systems and programming languages [13,18,19,22]. The resolution of graded monads were studied in [12], mildly extending a classic work by Street [26]. The major difference between graded monads and graded linear exponential comonads is the way how they interact with the monoidal structure. In [13] only *strengths* were considered for graded monads, while graded linear exponential comonads interact with monoidal structures in an intricate manner.

The multicategory of symmetric lax monoidal multifunctors is related to the 2-multicategory of T-algebras for a pseudo-commutative 2-monad T [11]. Hyland and Power studied multifunctors that are symmetric *strong* monoidal in each argument, while in this paper we weaken "strong" to "lax". Yet, we think that by suitably extending their theory, the symmetric lax monoidal multifunctors can also be given in the language of 2-monad theory.

Monoids in the multicategory **MSMC**<sup>l</sup> in Sect. 5 are similar to the distributivity studied in [15], where Laplaza considered two symmetric *non-strict* monoidal structures together with a *colax* distributivity between them. On the other hand, in this paper, we consider a *strict* monoidal structure on top of the underlying symmetric (non-strict) monoidal structure, and a *lax* distributivity between them.

#### **Preliminaries**

For symmetric monoidal categories and symmetric lax monoidal functors, see [16]. In a symmetric monoidal category <sup>C</sup>, by <sup>ι</sup> : **<sup>I</sup>** <sup>⊗</sup> **<sup>I</sup>** <sup>→</sup> **<sup>I</sup>** we mean the isomorphism <sup>λ</sup>**<sup>I</sup>** <sup>=</sup> <sup>ρ</sup>**I**, and by <sup>τ</sup> : (<sup>A</sup> <sup>⊗</sup> <sup>B</sup>) <sup>⊗</sup> (<sup>C</sup> <sup>⊗</sup> <sup>D</sup>) <sup>→</sup> (<sup>A</sup> <sup>⊗</sup> <sup>C</sup>) <sup>⊗</sup> (<sup>B</sup> <sup>⊗</sup> <sup>D</sup>) we mean the symmetry swapping the second and third component of the tensor product. For functors F<sup>i</sup> : m*<sup>i</sup>* <sup>j</sup>=1 <sup>C</sup>i,j <sup>→</sup> <sup>D</sup><sup>i</sup> where 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we define <sup>F</sup><sup>1</sup> ×···× <sup>F</sup><sup>n</sup> to be the composite functor - <sup>1</sup>≤i≤n,1≤j≤m*<sup>i</sup>* <sup>C</sup>i,j <sup>→</sup> n <sup>i</sup>=1( m*<sup>i</sup>* <sup>j</sup>=1 <sup>C</sup>i,j ) <sup>→</sup> n <sup>i</sup>=1 <sup>D</sup>i, whose codomain is the product category without the nesting of products.

### **3 Graded Linear Exponential Comonad**

In this paper, comonads are graded by a *partially ordered semiring*. It is a tuple (R, <sup>≤</sup>, <sup>0</sup>, <sup>+</sup>, <sup>1</sup>, <sup>∗</sup>) such that (R, <sup>0</sup>, <sup>+</sup>, <sup>1</sup>, <sup>∗</sup>) is a unital semiring (not necessarily commutative) and +, <sup>∗</sup> are monotone in each argument w.r.t. the partial order <sup>≤</sup>. The partially ordered monoids of additive and multiplicative parts of R are denoted by <sup>R</sup><sup>+</sup> = (R, <sup>≤</sup>, <sup>0</sup>, +) and <sup>R</sup><sup>∗</sup> = (R, <sup>≤</sup>, <sup>1</sup>, <sup>∗</sup>), respectively.

Let C, D be symmetric monoidal categories. We write **SMC**l(D, C) for the category of symmetric lax monoidal functors and monoidal natural transformations between them. The following pointwise extension of the tensor unit and tensor product on C extends to a symmetric monoidal structure on **SMC**l(D, C):

$$\dot{\mathbf{I}}(D) = \mathbf{I}, \quad (F \dot{\otimes} G)(D) = FD \otimes GD.$$

(We note that the symmetry in <sup>C</sup> is used to make <sup>F</sup> <sup>⊗</sup>˙ <sup>G</sup> a symmetric lax monoidal functor.) Below by [D, C]<sup>l</sup> we mean the symmetric monoidal category (**SMC**l(D, C), ˙ **<sup>I</sup>**, <sup>⊗</sup>˙ ) of symmetric lax monoidal functors and monoidal natural transformations between them.

#### **3.1 Graded Linear Exponential Comonad**

Fix a partially ordered semiring (R, <sup>≤</sup>, <sup>0</sup>, <sup>+</sup>, <sup>1</sup>, <sup>∗</sup>). We introduce the main subject of this study, R-*graded linear exponential comonad*. This concept first appeared in [3, Definition 13] under the name *exponential action*. We adopt the following definition [7, Sect. 5.2], which is equivalent to the exponential action:

**Fig. 1.** Four equational axioms related to distributive law

**Definition 1.** *An* R*-*graded linear exponential comonad *on a symmetric monoidal category* C *is a tuple* (D, w, c, , δ) *where*


*They satisfy four equational axioms in Fig. 1. Moreover, we say that* D *is an* Rtwist *if* Dr *is strong monoidal for each* <sup>r</sup> <sup>∈</sup> <sup>R</sup>*, and* (D, , δ) *is a strict monoidal functor (hence* <sup>D</sup>1 = Id *and* <sup>D</sup>(<sup>r</sup> <sup>∗</sup> <sup>r</sup> ) = Dr ◦ Dr *).*

When fully expanded, a graded linear exponential comonad specifies one functor <sup>D</sup> : (R, <sup>≤</sup>) <sup>→</sup> [C, <sup>C</sup>] and 6 natural transformations:


satisfying more than 20 equational axioms.

*Example 1.* Let C be a cartesian closed category. We take a partially ordered monoid <sup>R</sup><sup>×</sup> = (R, <sup>≤</sup>, <sup>1</sup>, <sup>×</sup>) such that (R, <sup>≤</sup>) is a join semilattice and <sup>×</sup> preserves joins in both arguments. This condition makes the tuple <sup>R</sup> = (R, <sup>≤</sup>, <sup>⊥</sup>,∨, <sup>1</sup>, <sup>×</sup>) a partially ordered semiring. We also take a lax monoidal functor <sup>G</sup> : <sup>R</sup><sup>×</sup> <sup>→</sup> <sup>C</sup>. Then the functor <sup>D</sup> : (R, <sup>≤</sup>)op <sup>→</sup> [C, <sup>C</sup>] defined by DrA <sup>=</sup> Gr <sup>⇒</sup> <sup>A</sup> extends to an Rop-graded linear exponential comonad on C (here Rop is the orderopposite of R).

*Example 2.* Continuing the previous example, let <sup>R</sup> = (D, <sup>≤</sup>, <sup>⊥</sup>,∨, ,∧) be a distributive lattice, regarded as a partially ordered semiring. We consider the functor category [D, **Set**], where D is regarded as the discrete category of the carrier set <sup>D</sup>. We then define <sup>G</sup> : <sup>R</sup> <sup>→</sup> [D, **Set**] by (Gr)r <sup>=</sup> <sup>∅</sup> if <sup>r</sup> ≤ <sup>r</sup>, and (Gr)r <sup>=</sup> {∗} if <sup>r</sup> <sup>≤</sup> <sup>r</sup>. This <sup>G</sup> extends to a lax monoidal functor of type <sup>G</sup> : <sup>R</sup><sup>×</sup> <sup>→</sup> [D, **Set**]. From the construction in the previous example, DrA <sup>=</sup> Gr <sup>⇒</sup> <sup>A</sup> is a graded linear exponential comonad, which coincides with the *masking functor* given in [7, Theorem 2]. It behaves as (DrA)r <sup>=</sup> {∗} if <sup>r</sup> ≤ <sup>r</sup> and (DrA)r <sup>=</sup> Ar if <sup>r</sup> <sup>≤</sup> <sup>r</sup>. This graded linear exponential comonad is used to model the level of information flow [7, Sect. 6.1].

*Example 3.* Consider the category **EPMet** of extended pseudometric spaces<sup>1</sup> and nonexpansive functions between them. It has a symmetric monoidal (closed) structure, whose unit is a terminal object, and whose tensor product is given by (X, d)⊗(Y,e)=(<sup>X</sup> <sup>×</sup>Y, d+e). It also has the *scaling modality* !r(X, d)=(X, rd), where r is an element of the ordered semiring of nonnegative extended reals, which we denote by [0,∞]. The scaling modality is a [0,∞]-twist with respect to the above symmetric monoidal structure.

The concept of R-graded linear exponential comonad is a generalization of non-graded linear exponential comonad [1, Definition 3]. This was first observed in [3].

**Theorem 1.** *A* 1*-graded linear exponential comonad on a symmetric monoidal category* C *is exactly a non-graded linear exponential comonad on* C*.*

On the other hand, 1-twists make monoidal structures cartesian:

**Theorem 2.** *A* 1*-twist* D *exists on a symmetric monoidal category* C *if and only if the symmetric monoidal structure of* C *is cartesian (i.e.* **I** *is terminal and* ⊗ *is a binary product).*

*Proof.* If it exists, the functor part of D must specify the identity functor Id<sup>C</sup> because of the strictness. Next, (Id, w, c) becomes a commutative monoid in [C, C]l; especially w, c are monoidal natural transformations. From [17, Corollary 17], the monoidal structure of C is cartesian. The converse construction is evident.

<sup>1</sup> Here, extended pseudometrics mean the pseudometrics that can return +∞.

### **4 A Double-Category Theoretic Reformulation of Graded Linear Exponential Comonad**

Although it is in a reasonably compact form, the definition of graded linear exponential comonad is yet technical, and it indeed specifies a quite complex structure. The motivation of this study is to have a conceptually clean and compact definition of it.

Particularly, what is less clear in the definition is the extra four axioms related to the distributive law (Fig. 1). In the non-graded setting (i.e. when R = 1), these four axioms reduces to simpler axioms, which can be viewed as the following conditions:


However, it is not obvious how to upgrade these axioms to the graded setting, because the concept of "graded coalgebra" and "graded comonoid" are not yet defined, at least for graded linear exponential comonads. Especially, the concept of graded coalgebra should be defined after the concept of graded linear exponential comonad, which we are going to define! From this circularity, the above view of the four axioms are not very helpful when upgrading them in the current situation.

It is therefore desirable to have an alternative account on four axioms in Fig. 1, which relies on a notion that already exists *before* graded linear exponential comonads. The key observation of this paper is that these four axioms are an instance of the axioms for 2-cells in the double category **SMC** of symmetric monoidal categories, introduced by Grandis and Par´e [10, Sect. 2.3]. In **SMC**, a 2-cell consists of the following data:

$$\begin{array}{c} \bullet \xrightarrow{H \xrightarrow{H}} \bullet \\ V' \xleftarrow{H} \xrightarrow{H} \end{array} \Big|\_{V} \begin{array}{c} \end{array}$$

where each • is a (possibly distinct) symmetric monoidal category, horizontal morphisms H, H are symmetric lax monoidal functors, vertical morphisms V,V are symmetric colax monoidal functors, and <sup>a</sup> : <sup>V</sup> ◦ <sup>H</sup> <sup>→</sup> <sup>H</sup> ◦ <sup>V</sup> is a natural transformation (between underlying functors of H, H ,V,V ) making the following diagrams commute:

$$\begin{array}{|c|c|c|} \hline \mathbf{YI} \longrightarrow VHI & V(HX \otimes HY) \xrightarrow{\begin{subarray}{c} V(HX \otimes HY) \ \longrightarrow \\ \downarrow \\ \mathit{H'V'I} \end{subarray}} VH(X \otimes Y) \\\ \begin{array}{|c|c|} \hline \mathbf{Y}'V'\mathbf{I} & VHX \otimes VHY & H'V'(X \otimes Y) \end{array} \\\ \begin{array}{|c|c|} \hline \mathbf{I} & \mathbf{I} & \mathbf{I}' \end{array} \\\ \begin{array}{|c|c|} \hline \mathbf{I} & \mathbf{I}'V'X \otimes H'V'Y \xrightarrow{\begin{subarray}{c} \mathbf{I} \ \mathbf{0} \end{subarray}} H'(V'X \otimes V'Y) \\\ \hline \end{array} \\\ \begin{array}{|c|c|} \hline \mathbf{I} & \mathbf{I} & \mathbf{I}' & \mathbf{I} & \mathbf{I} & \mathbf{I} & \mathbf{I} & \mathbf{I} \\\ \begin{array}{|c|c|} \hline \mathbf{I} & \mathbf{0} & \mathbf{I} & \mathbf{I} & \mathbf{I} & \mathbf{I} \\\hline \mathbf{I} & \mathbf{0} & \mathbf{I} & \mathbf{I} & \mathbf{I} & \mathbf{I} \\\hline \end{array} \\\hline \end{array}$$

We note that when V,V (resp. H, H ) are identity functors, the above axioms are reduced to the ones for monoidal natural transformations of type <sup>V</sup> <sup>→</sup> <sup>V</sup> (resp. <sup>H</sup> <sup>→</sup> <sup>H</sup> ).

Let us see how 2-cell axioms (1) in **SMC** derives the four axioms in Fig. 1.

**Proposition 1.** *In Definition 1, the four axioms (Fig. 1) can be replaced by the following statement: for each* <sup>r</sup> <sup>∈</sup> <sup>R</sup>*, both*

$$\delta\_{r,-}: D(r\*-) \to Dr \circ D-, \quad \delta\_{-,r}: D(-\*r) \to D- \circ Dr$$

*are 2-cells of the following type in* **SMC***:*

$$\begin{array}{c} R^{+} \xrightarrow{r\ast-} \begin{array}{c} R^{+} \\ \Downarrow\delta\_{r,-} \end{array} \begin{array}{c} R^{+} \\ \Downarrow\ddots\end{array} \begin{array}{c} R^{+} \\ \Downarrow\delta\_{-,r} \end{array} \begin{array}{c} R^{+} \\ \Downarrow\delta\_{-,r} \end{array} \begin{array}{c} R^{+} \\ \Downarrow\delta\_{-,r} \end{array} \begin{array}{c} R^{+} \\ \Downarrow\delta\_{-,r} \end{array} \end{array}$$

### **5 Multicategory of Symmetric Lax Monoidal Multifunctors**

Proposition 1 says that by fixing one index of the doubly-indexed natural transformation <sup>δ</sup>−,<sup>=</sup> : <sup>D</sup>(− ∗=) <sup>→</sup> <sup>D</sup>− ◦D=, we obtain a 2-cell in the double category **SMC**. However, δ itself does not live in **SMC**. In order to create a room to accommodate δ as a kind of 2-cell, we extend horizontal morphisms of **SMC** to multi-ary functors that are symmetric lax monoidal in *each argument*. We first study such multi-ary functors in this section.

Let <sup>C</sup><sup>i</sup> (1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>) and <sup>D</sup> be symmetric monoidal categories. Intuitively, an <sup>n</sup>-ary functor <sup>F</sup> : <sup>C</sup><sup>1</sup> ×···× <sup>C</sup><sup>n</sup> <sup>→</sup> <sup>D</sup> is symmetric lax monoidal in each argument if it comes with a structure making the functor <sup>F</sup>(C1, .., <sup>−</sup>m, .., Cn) : <sup>C</sup><sup>m</sup> <sup>→</sup> <sup>D</sup> symmetric lax monoidal for each <sup>m</sup> ∈ {1, ··· , n} and <sup>C</sup><sup>i</sup> <sup>∈</sup> <sup>C</sup>i, <sup>i</sup> <sup>∈</sup> {1, ··· , n}\{m}. Moreover, these symmetric lax monoidal structures commute with each other in a coherent manner.

To formally define such multi-ary symmetric lax monoidal functors, we introduce a notation for sequences. For a sequence <sup>C</sup> <sup>=</sup> <sup>C</sup>1, ··· , C<sup>n</sup> of mathematical objects, a natural number 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and another sequence <sup>D</sup>, by C[i : D] we mean the sequence obtained by replacing C<sup>i</sup> with D. For instance, (1, 3, 5)[2 : X, Y ]=1, X, Y, 5. When D is empty, C[i :] stands for the sequence obtained by removing the i-th element of C.

**Definition 2.** *<sup>A</sup>* symmetric lax monoidal multifunctor *of type* (C1, ··· , <sup>C</sup>n) <sup>→</sup> <sup>D</sup> *consists of a functor and a family of natural transformations indexed by* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*:*

$$F: \mathbb{C}\_1 \times \dots \times \mathbb{C}\_n \to \mathbb{D}$$

$$\phi^i\_{C[i:]}: \mathbf{I} \to F(C[i:X]) \quad \text{( $C \in \mathbb{C}\_1 \times \dots \times \mathbb{C}\_n$ )}$$

$$\phi^i\_{C[i:X,Y]}: F(C[i:X]) \otimes F(C[i:Y]) \to F(C[i:X \otimes Y]) \quad \text{( $C \in \mathbb{C}\_1 \times \dots \times \mathbb{C}\_n, X, Y \in \mathbb{C}\_i$ )}$$

*such that:*


$$\begin{array}{ll} \mathsf{T}\_{C}\dot{\phi}\_{C[:;\mathcal{I}][:;\mathcal{I}]}^{i} = \phi\_{C[:;\mathcal{I}][:;\mathcal{I}]}^{j} \\ \quad - \phi\_{C[:;\mathcal{I}][:;P,Q]}^{j} \circ \left(\phi\_{C[:;P][:;\mathcal{I}]}^{i} \otimes \phi\_{C[:;Q][:;\mathcal{I}]}^{i}\right) = \phi\_{C[:;P\otimes Q][:;\mathcal{I}]}^{i} \circ \iota \\ \quad - \phi\_{C[:;\mathcal{I}][:;P,Q]}^{j} \circ \left(\phi\_{C[:;P][:;\mathcal{I}]}^{j} \otimes \phi\_{C[:;\mathcal{I}][:;\mathcal{I}]}^{j}\right) = \phi\_{C[:;P\otimes Q][:;\mathcal{I}]}^{j} \circ \iota \\ \quad - \phi\_{C[:;X\otimes Y][j:;P,Q]}^{j} \circ \left(\phi\_{C[:;P][:;X,Y]}^{i} \otimes \phi\_{C[:;Q][:;X,Y]}^{i}\right) = \phi\_{C[:;P\otimes Q][:;X,Y]}^{i} \circ \iota \\ \quad \left(\phi\_{C[:;X][j:;P,Q]}^{j} \otimes \phi\_{C[:;Y][j:;P,Q]}^{j}\right) \circ \tau . \end{array}$$

We note that a symmetric lax monoidal multifunctor of type () <sup>→</sup> <sup>D</sup> is just an object in D, because all natural transformations vanish and only the functor of type 1 <sup>→</sup> <sup>D</sup> remains.

*Example 4.* Let us see how the definition of a binary symmetric lax monoidal multifunctor <sup>M</sup> : (C, <sup>C</sup>) <sup>→</sup> <sup>C</sup> is unfolded. It consists of a functor <sup>M</sup> : <sup>C</sup>×<sup>C</sup> <sup>→</sup> <sup>C</sup> and the following natural transformations:

$$\begin{aligned} \phi^1\_C: \mathbf{I} \to M(\mathbf{I}, C), \quad \phi^1\_{X, Y, C}: M(X, C) \otimes M(Y, C) \to M(X \otimes Y, C) \\ \phi^2\_C: \mathbf{I} \to M(C, \mathbf{I}), \quad \phi^2\_{C, X, Y}: M(C, X) \otimes M(C, Y) \to M(C, X \otimes Y) \end{aligned}$$

such that


$$\begin{split} \phi^1\_\mathbf{I} &= \phi^2\_\mathbf{I}, \quad \phi^1\_{C \otimes C'} \circ \iota = \phi^2\_{\mathbf{I}, C, C'} \circ (\phi^1\_C \otimes \phi^1\_{C'}), \quad \phi^2\_{C \otimes C'} \circ \iota = \phi^1\_{C, C', \mathbf{I}} \circ (\phi^2\_C \otimes \phi^2\_{C'}) \\ \phi^2\_{C \otimes C', D, D'} &\circ (\phi^1\_{C, C', D} \otimes \phi^1\_{C, C', D'}) = \phi^1\_{C, C', D \otimes D'} \circ (\phi^2\_{C, D, D'} \otimes \phi^2\_{C', D, D'}) \circ \tau \end{split}$$

We will later use the following binary symmetric lax monoidal multifunctors. Let R be a partially ordered semiring and C be a symmetric monoidal category.


Note that (∗) is symmetric strict monoidal in each argument, while (◦), ev are symmetric strict monoidal in the first argument, and symmetric lax monoidal in the second argument.

Next, for symmetric lax monoidal multifunctors (F, φ):(C1, ··· , <sup>C</sup>n) <sup>→</sup> <sup>D</sup> and (Gi, γ(i)) : (Bi,<sup>1</sup>, ··· ,Bi,m*<sup>i</sup>* ) <sup>→</sup> <sup>C</sup><sup>i</sup> (1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>), we define their *multicomposition*. First, we define a bijection (/) : {(i, j) <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> n, <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>i} → {1, ··· , <sup>1</sup>≤i≤<sup>n</sup> <sup>m</sup>i}, and represent a number in the latter set as the pair of numbers uniquely determined by (/) in the former set. Then the multicomposition is given by the following (H, η):

H = F ◦ (G<sup>1</sup> ×···× G*n*) <sup>η</sup>*i/j* (*B*1*,*··· *,Bn*)[*i/j*:] <sup>=</sup> <sup>F</sup> ((GB1, ··· , GB*n*)[<sup>i</sup> : <sup>γ</sup>(i) *j Bi*[*j*:]] ◦ <sup>φ</sup>*<sup>i</sup>* (*GB*1*,*··· *,GBn*)[*i*:] <sup>η</sup>*i/j* (*B*1*,*··· *,Bn*)[*i/j*:*X,Y* ] <sup>=</sup> <sup>F</sup> ((GB1, ··· , GB*n*)[<sup>i</sup> : <sup>γ</sup>(i) *j Bi*[*j*:*X,Y* ] ] ◦ <sup>φ</sup>*<sup>i</sup>* (*GB*1*,*··· *,GBn*)[*i*:*G*(*Bi*[*j*:*X*])*,G*(*Bi*[*j*:*Y* ])]

**Theorem 3.** *Symmetric monoidal categories, symmetric lax monoidal multifunctors, and the above multi-composition form a multicategory* **MSMC**l*.*

*Proof* (Proof sketch). To check that symmetric lax monoidal multifunctors are closed under multicomposition, the key case is when n = 2, m<sup>1</sup> = m<sup>2</sup> = 1 and n = 1, m<sup>1</sup> = 2.

In **MSMC**<sup>l</sup> we consider *monoids* and *monoid actions*. A *monoid* is a tuple (C, U : () <sup>→</sup> <sup>C</sup>, M : (C, <sup>C</sup>) <sup>→</sup> <sup>C</sup>) of a symmetric monoidal category <sup>C</sup> and symmetric lax monoidal multifunctors U, M such that

Id = <sup>M</sup> ◦ (Id, U), Id = <sup>M</sup> ◦ (U,Id), M ◦ (Id, M) = <sup>M</sup> ◦ (M,Id).

An *action* of a monoid (C, U, M) on a symmetric monoidal category D is a symmetric lax monoidal multifunctor <sup>A</sup> : (C, <sup>D</sup>) <sup>→</sup> <sup>D</sup> such that

$$A \circ (U, \mathrm{Id}) = \mathrm{Id}, \quad A \circ (\mathrm{Id}, A) = A \circ (M, \mathrm{Id}).$$

By unfolding the definition, a monoid (C, U, M) in **MSMC**<sup>l</sup> equips C with an additional strict monoidal structure (U, M). The argument-wise symmetric lax monoidal structure on M becomes a lax distributivity (see Example 4). Thus we call a monoid in **MSMC**<sup>l</sup> a *lax distributive strict rig category*. It has a smaller set of coherence axioms than the one given by Laplaza in [15], thanks to the strictness of (U, M).

*Example 5* (Continued from Example 4). (R+, <sup>1</sup>, <sup>∗</sup>) and ([C, <sup>C</sup>]l,Id, ◦) are both lax distributive strict rig categories. Both monoids acts on themselves. The latter monoid acts on C with the evaluation functor ev.

### **6 Graded Linear Exponential Comonads as Vertical Monoid Homomorphisms**

We now extend the double category **SMC** of Grandis and Par´e by replacing horizontal morphisms with symmetric lax monoidal multifunctors. The concept of 2-cells in **SMC** is also replaced by *prisms* — the reason of the name is because they are placed in the middle of the space surrounded by two horizontal multifunctors and vertical morphisms. Such a prism is defined to be a natural transformation that is a 2-cell of **SMC** in *each argument*.

**Definition 3.** *Let* <sup>F</sup> : (C1, ··· , <sup>C</sup>n) <sup>→</sup> <sup>D</sup> *and* <sup>G</sup> : (E1, ··· ,En) <sup>→</sup> <sup>F</sup> *be symmetric lax monoidal multifunctors and* <sup>V</sup><sup>i</sup> : <sup>C</sup><sup>i</sup> <sup>→</sup> <sup>E</sup><sup>i</sup> *(*<sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*) and* <sup>W</sup> : <sup>D</sup> <sup>→</sup> <sup>F</sup> *be symmetric colax monoidal functors. A* prism <sup>α</sup> *of type* (V1, ··· , Vn) <sup>→</sup> <sup>W</sup> : <sup>F</sup> <sup>→</sup> G*, which is depicted as*

$$\left(\mathbb{C}\_{1}, \dots, \mathbb{C}\_{n}\right) \xrightarrow{F} \xrightarrow{F} \mathbb{D}$$
 
$$\left(\mathbb{E}\_{1}, \dots, \mathbb{E}\_{n}\right) \xrightarrow[G]{F} \xrightarrow{F} \mathbb{E}$$

*is a natural transformation* <sup>α</sup> : <sup>W</sup> ◦ <sup>F</sup> <sup>→</sup> <sup>G</sup> ◦ (V<sup>1</sup> ×···× <sup>V</sup>n) *such that for each* <sup>C</sup> <sup>∈</sup> n <sup>i</sup>=1 <sup>C</sup><sup>i</sup> *and* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*,* <sup>α</sup>C[i:−] *is a 2-cell of the following type in the double category* **SMC***:*

We note that when <sup>n</sup> = 0, a prism <sup>α</sup> : () <sup>→</sup> <sup>W</sup> : <sup>F</sup> <sup>→</sup> <sup>G</sup> is simply a morphism <sup>α</sup> : W F <sup>→</sup> <sup>G</sup> in <sup>F</sup>.

**Proposition 2.** *Let* <sup>D</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> [C, <sup>C</sup>]<sup>l</sup> *be a symmetric colax monoidal functor and* <sup>δ</sup> *be a prism of type* (D, D) <sup>→</sup> <sup>D</sup> : (∗) <sup>→</sup> (◦)*, where* (∗) *and* (◦) *are symmetric lax monoidal multifunctors appeared in Example 4. Then for each* <sup>r</sup> <sup>∈</sup> <sup>R</sup>*,* <sup>δ</sup>r,<sup>−</sup> *and* <sup>δ</sup>−,r *are 2-cells of the following type in* **SMC***:*

R+ D <sup>r</sup>∗− - ⇓δ*r,*<sup>−</sup> R+ D R+ D −∗<sup>r</sup> - ⇓δ−*,r* R+ D [C, <sup>C</sup>]<sup>l</sup> Dr◦− -[C, <sup>C</sup>]<sup>l</sup> [C, <sup>C</sup>]<sup>l</sup> −◦Dr -[C, C]<sup>l</sup>

Like double categories, composition of prisms can be done in two directions. Consider the following prisms (1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>).

$$\begin{array}{c} \left(\mathbb{B}\_{i,1},\cdots,\mathbb{B}\_{i,m\_{i}}\right) \xrightarrow{G\_{i}} \xrightarrow{G\_{i}} \mathbb{C}\_{i} \xrightarrow{G\_{i}} \quad \left(\mathbb{C}\_{1},\cdots,\mathbb{C}\_{n}\right) \xrightarrow{F} \xrightarrow{F} \mathbb{D} \\ \left(\mathbb{B}\_{i,1}',\cdots,\mathbb{B}\_{i,m\_{i}}'\right) \xrightarrow{G\_{i}'} \xrightarrow{G\_{i}'} \mathbb{C}\_{i}' \xrightarrow{\mathbb{V}\_{i}} \mathbb{C}\_{i}' \xrightarrow{\mathbb{V}\_{i}} \mathbb{C}\_{i}' \xrightarrow{F'} \mathbb{D}' \\ \left(\mathbb{B}\_{i,1}',\cdots,\mathbb{U}\_{i,m\_{i}}'\right) \xrightarrow{\mathbb{V}\_{i}} \mathbb{S}\_{i} \xrightarrow{\mathbb{V}\_{i}} \mathbb{B}\_{i}' \xrightarrow{F} \mathbb{D}' \\ \left(\mathbb{B}\_{i,1}'',\cdots,\mathbb{B}\_{i,m\_{i}}'\right) \xrightarrow{G\_{i}'} \mathbb{C}\_{i}' \xrightarrow{F} \mathbb{C}\_{i}'' \end{array}$$

Then define *vertical composition* and *horizontal multicomposition* of prisms by the following (ordinary) natural transformations:

$$\begin{aligned} \beta \odot \alpha &= (\beta \circ (V\_1 \times \dots \times V\_n)) \bullet (W' \circ \alpha) \\ \alpha \circledast (\gamma\_1, \dots, \gamma\_n) &= (F' \circ (\gamma\_1 \times \dots \times \gamma\_n)) \bullet (\alpha \circ (G\_1 \times \dots \times G\_n)) \end{aligned}$$

where • on the right hand side is the vertical composition of natural transformations.

#### **Proposition 3.** *In the above setting,*


$$(\beta \circledast (\delta\_1, \cdots, \delta\_n)) \circledast (\alpha \circledast (\gamma\_1, \cdots, \gamma\_n)) = (\beta \circledast \alpha) \circledast (\delta\_1 \circledast \gamma\_1, \cdots, \delta\_n \circledast \gamma\_n).$$

**Definition 4.** *Let* (C, U, M),(D, U , M ) *be monoids in* **MSMC**l*. A* vertical monoid homomorphism *consists of a symmetric colax monoidal functor* <sup>A</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> *and prisms* : () <sup>→</sup> <sup>A</sup> : <sup>U</sup> <sup>→</sup> <sup>U</sup> *and* <sup>δ</sup> : (A, A) <sup>→</sup> <sup>A</sup> : <sup>M</sup> <sup>→</sup> <sup>M</sup> *:*

() ⇓ <sup>U</sup> -C A ⇓δ (C, C) <sup>M</sup> (A,A) () U- -<sup>D</sup> (D, <sup>D</sup>) M- 

*such that the following prism equalities hold:*

$$
\delta \circledast (\mathrm{id}, \epsilon) = \mathrm{id}, \quad \delta \circledast (\epsilon, \mathrm{id}) = \mathrm{id}, \quad \delta \circledast (\mathrm{id}, \delta) = \delta \circledast (\delta, \mathrm{id}).
$$

The above prism equalities amounts to the following equality of natural transformations:

$$\begin{aligned} M'(AX,\epsilon) \circ \delta\_{X,U} &= \text{id} & M'(\epsilon, AX) \circ \delta\_{U,X} &= \text{id} \\ M'(AX,\delta\_{Y,Z}) \circ \delta\_{X,M(Y,Z)} &= M'(\delta\_{X,Y}, AZ) \circ \delta\_{M(X,Y),Z} \end{aligned}$$

With this concept, we can concisely capture R-graded linear exponential comonads:

**Theorem 4.** *There is a bijective correspondence between*


Vertical monoid homomorphisms vertically compose. Therefore we can extend a graded linear exponential comonad (as a vertical monoid homomorphism) by stacking vertical monoid homomorphisms.

**Proposition 4.** *Let* R, S *be partially ordered semirings. Then a vertical monoid homomorphism from* (R+, <sup>1</sup>R, <sup>∗</sup>R) *to* (S+, <sup>1</sup>S, <sup>∗</sup>S) *bijectively corresponds to a monotone function* <sup>h</sup> : (R, <sup>≤</sup>R) <sup>→</sup> (S, <sup>≤</sup>S) *such that* <sup>h</sup>( <sup>R</sup> <sup>r</sup>i) <sup>≤</sup> <sup>S</sup> <sup>h</sup>(ri) *and* h( - <sup>R</sup> <sup>r</sup>i) <sup>≤</sup> - <sup>S</sup> <sup>h</sup>(ri) *(which we call* colax homomorphism*).*

**Proposition 5.** *Let* <sup>F</sup> <sup>U</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> *be a symmetric lax monoidal adjunction. Then the functor* V <sup>F</sup> <sup>U</sup> *defined by* V <sup>F</sup> <sup>U</sup><sup>H</sup> <sup>=</sup> <sup>F</sup> ◦ <sup>H</sup> ◦ <sup>U</sup> *is a vertical monoid homomorphism from* ([C, <sup>C</sup>]l,Id, ◦) *to* ([D, <sup>D</sup>]l,Id, ◦)*.*

*Proof.* Let <sup>F</sup> <sup>U</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> be a symmetric lax monoidal adjunction. From Kelly's doctrinal adjunction, <sup>F</sup> is symmetric strong monoidal, hence so is <sup>F</sup> ◦ − in the following diagram:

$$V^{F \to U} = [\mathbb{C}, \mathbb{C}]\_l \xrightarrow{F \circ -} [\mathbb{C}, \mathbb{D}]\_l \xrightarrow{-\circ U} [\mathbb{D}, \mathbb{D}]\_l$$

Next, − ◦<sup>U</sup> above is always symmetric strict monoidal. By composing them, we obtain that V <sup>F</sup> <sup>U</sup> is symmetric strong, hence colax monoidal. We next introduce prisms (, δ) of the following type:

$$\begin{array}{l} \text{()} \xrightarrow{\text{Id}} \text{\{}^{\text{Id}} \text{\{}^{\text{C}}, \text{C}\} \xleftarrow{\text{o}} \text{\{}^{\text{O}}} \text{\{}^{\text{U}}, [\text{C}, \text{C}]\_{l}\} \\ \text{()} \xrightarrow{\text{\uph}\text{\uph}} \text{\uph}^{V^{F \to U}} \text{\uph}^{\text{\uph}\text{\uph}} \text{\uph}^{\text{\uph}\text{\uph}^{\text{\uph}}} \\ \text{()} \xrightarrow{\text{Id}} \text{\uph}^{\text{\uph}}[\text{D}, \text{D}]\_{l} \xleftarrow{\text{o}} \text{\uph}^{\text{\uph}} \text{\uph}^{\text{\uph}}[\text{D}, [\text{D}, \text{D}]\_{l}] \end{array}$$

We define to be the counit of the adjunction <sup>F</sup> <sup>U</sup>, which is monoidal natural, and δ be the following natural transformation:

$$\delta\_{H\_1, H\_2} = V^{F \dashv U}(H\_1 \circ \eta \circ H\_2) : V^{F \dashv U}(H\_1 \circ H\_2) \to V^{F \dashv U}H\_1 \circ V^{F \dashv U}H\_2$$

It is routine to check that this satisfies the axioms of prism.

**Theorem 5.** *Let* R *be a partially ordered semiring and* D *be an* R*-graded linear exponential comonad on a symmetric monoidal category* C*. We moreover let* S *be another partially ordered semiring,* <sup>h</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup> *be a colax homomorphism and* <sup>F</sup> <sup>U</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> *be a symmetric lax monoidal adjunction. Then the following composite of vertical monoid homomorphisms is an* S*-graded linear exponential comonad on* D*.*

$$((S^+,1\_S,\*\_S) \xrightarrow{h} (R^+,1\_R,\*\_R) \xrightarrow{D} ([\mathbb{C},\mathbb{C}]\_l,\mathrm{Id}\_{\mathbb{C}},\diamond) \xrightarrow{V^{F\to U}} ([\mathbb{D},\mathbb{D}]\_l,\mathrm{Id}\_{\mathbb{D}},\diamond)$$

We call the above composite the *extension* of <sup>D</sup> with <sup>F</sup> <sup>U</sup> and <sup>h</sup>.

### **7 From Monoid Actions to Graded Comonoid-Coalgebras**

Let (D, , δ):(R+, <sup>1</sup>, <sup>∗</sup>) <sup>→</sup> ([C, <sup>C</sup>]l,Id, ◦) be an <sup>R</sup>-graded linear exponential comonad as a vertical monoid homomorphism. The prism equations in Definition 4 suggests that the vertical monoid homomorphism itself can be seen as a monoid. We can thus consider *monoid actions* of (D, , δ): it consists of a prism

such that the following prism equations hold:

a (δ, id) = a (id, a), a (, id) = id.

We note that this makes sense because (∗) and ev are also monoid actions in **MSMC**l; see Example 5. By unfolding this definition, we obtain the following structure, which we name *graded comonoid-coalgebra*.

**Definition 5.** *Let* R *be a partially ordered semiring. An* R*-*graded comonoidcoalgebra *of an* R*-graded linear exponential comonad* (D, w, c, , δ) *on a symmetric monoidal category* C *is a tuple* (A, a, u, o) *such that*


*They satisfy the following six equational axioms:*

$$\begin{array}{c|c|c} A(r \ast s \ast t) & \xrightarrow[a\_{r \ast s \ast t}]{a\_{r \ast s \ast t}} D(r)(A(s \ast t)) & A(1 \ast t) \xrightarrow{a\_{1 \ast t}} D(1)(A(t)) \\ & \xrightarrow[D(r \ast s)]{\alpha\_{r \ast s}} D(r)(D(s)(A(t))) & & \xrightarrow[A(t)]{\alpha\_{t \ast t}} \\ D(r \ast s)(A(t)) \xrightarrow[\alpha\_{r \ast s}]{\alpha\_{r \ast s}} A(0 \ast r) & A(0) \xrightarrow[A(r \ast 0)]{\alpha\_{t \ast t}} A(r \ast 0) \\ & \xrightarrow[\alpha\_{r \ast s}]{\alpha\_{0 \ast r}} D(0)(A(r)) & \xrightarrow[\alpha\_{r \ast s}]{\alpha\_{r \ast s}} & D(r)(A(0)) \\ & \xrightarrow[\alpha\_{s \ast r}]{\alpha\_{s \ast r}} I & \xrightarrow[\alpha\_{r \ast s}]{\alpha\_{r \ast s}} D(r)(\mathbf{I}) \\\\ & \xrightarrow[\alpha\_{s \ast r}]{\alpha\_{s \ast r}} I & & \xrightarrow[\alpha\_{s \ast t}]{\alpha\_{s \ast t}} D(r)(\mathbf{I}) \\ & \xrightarrow[A(s \ast r)]{\alpha\_{s \ast r}} A(s \ast r) & & \xrightarrow[\alpha\_{s \ast t}]{\alpha\_{s \ast t}} & \\\\ & (s \ast r) \otimes A(t \ast r) & & & \xrightarrow[\alpha\_{s \ast t}]{\alpha\_{s \ast t}} D(r) \\ & D(s)(A(r)) \otimes D(t)(A(r)) & \xrightarrow{\alpha\_{s \ast t}} D(s)(A(r)) \otimes D(t)(A(r)) \end{array}$$

$$\begin{array}{c|c} A(r\*s+r\*t) \xrightarrow{\begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{array} \begin{smallmatrix} A(r\*\left(s+t\right)) \end{smallmatrix} \end{array}}{\begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{array} \begin{smallmatrix} A(r\*\left(s+t\right)) \end{smallmatrix} \end{pmatrix}} A(r\*\left(s+t\right)) \\\\ A(r\*s) \otimes A(r\*t) \end{array} \begin{aligned} A(r\*\left(s+t\right)) \\ \begin{array}{c|c} \\ D(r)(A(s)+t) \end{array} \\ D(r)(A(s)) \otimes D(r)(A(t)) \end{aligned} \end{array} \begin{aligned} A(r\*\left(s+t\right)) \\ \begin{bmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \begin{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{smallmatrix} \end{bmatrix} A(r)(s+t) \end{bmatrix} \end{cases}$$

*A morphism from an* R*-graded comonoid-coalgebra* (A, a, u, o) *to another* (B, b, v, p) *is a monoidal natural transformation* <sup>h</sup> : (A, u, o) <sup>→</sup> (B, v, p) *such that* h *satisfies:*

$$\begin{aligned} A(r\*s) & \xrightarrow{h\_{r\*\*s}} B(r\*s) \\ a\_{r,s} & \downarrow \\ Dr(As) & \xrightarrow{} Dr(Bs) \end{aligned}$$

*We write* C(C, D) *for the category of* R*-graded comonoid-coalgebras of* D*.*

**Proposition 6.** *Let* R *be a partially ordered semiring and* (D, w, c, , δ) *be an* R*-graded linear exponential comonad on a symmetric monoidal category* C*. The following gives a symmetric monoidal structure on* C(C, D)*:*

$$\begin{aligned} \mathbf{I} &= \{\mathbf{f}, (\lambda r, s \ . \, . \, m\_r), \mathrm{id}\_{\mathbf{f}}, (\lambda r, s \ . \, \iota^{-1})\} \\ &(A, a, u, o) \otimes (B, b, v, p) \\ &= (A \,\, \dot{\otimes} B, \quad \lambda r, r' \ . \, m\_{r, Ar', Br'} \diamond \, (a\_{r, r'} \otimes b\_{r, r'}), \ \iota \circ (u \otimes v), \ \lambda r, r' \ . \, \tau \diamond (o\_{r, r'} \otimes p\_{r, r'})\} \\ &(f \otimes g)\_{r} = f\_{r} \otimes g\_{r} \\ &(\lambda A\_{A})\_{r} = \lambda\_{Ar}, \quad (\rho\_{A})\_{r} = \rho\_{Ar}, \quad \langle \alpha\_{A, B, C} \rangle\_{r} = \alpha\_{Ar}, \\ &\sigma\_{r} \alpha\_{r} = \lambda\_{r} \end{aligned}$$

When R = 1, The category C(C, D) reduces to the category of Eilenberg-Moore coalgebras of the non-graded linear exponential comonad.

**Theorem 6.** *Let* (D, w, c, , δ) *be a* 1*-graded linear exponential comonad on a symmetric monoidal category* C*. Then the category* C(C, D) *is strong monoidally isomorphic to the category* C<sup>D</sup> *of Eilenberg-Moore coalgebras of the comonad* (D, , δ)*.*

Like C<sup>D</sup>, there is a symmetric lax monoidal adjunction of the following type:

$$C(\mathbb{C}, D) \xrightarrow{F} \xrightarrow[U]{F} \mathbb{C}$$

but this itself is not enough to recover D — D takes two arguments, while the composite <sup>F</sup> ◦<sup>U</sup> is only equal to the symmetric lax monoidal comonad <sup>D</sup>1 on <sup>C</sup>. The category C(C, D) actually carries an R-*twist* T, which acts on comonoidcoalgebras as follows:

$$Tr(A, \cdot \cdot \cdot) = (A(-\*r), \cdot \cdot \cdot),$$

and <sup>D</sup> is recovered as the extension of <sup>T</sup> with the adjunction <sup>F</sup> <sup>U</sup> (Theorem 5).

**Theorem 7.** *Let* R *be a partially ordered semiring and* (D, w, c, , δ) *be an* R*graded linear exponential comonad on a symmetric monoidal category* C*.*


TrA = (λs . A(<sup>s</sup> <sup>∗</sup> <sup>r</sup>), λs, s . as,s-∗<sup>r</sup>, u, λs, s . o<sup>s</sup>∗r,s-<sup>∗</sup><sup>r</sup>), (Trh)<sup>t</sup> <sup>=</sup> <sup>h</sup><sup>t</sup>∗<sup>r</sup> (m<sup>T</sup> <sup>r</sup> )<sup>t</sup> = id**I**, (m<sup>T</sup> r,A,B)<sup>t</sup> = idA(t∗r)⊗B(t∗r), (w<sup>T</sup> <sup>A</sup>)<sup>t</sup> <sup>=</sup> u, (c<sup>T</sup> r,s,A)<sup>t</sup> <sup>=</sup> <sup>o</sup><sup>t</sup>∗r,t∗<sup>s</sup>.

*Here,* A = (A, a, u, o) *and* B *are* R*-graded comonoid coalgebras. From the definition of twists,* <sup>T</sup> , δ<sup>T</sup> *are identities.*

*3. The extension of* <sup>D</sup> *with* <sup>F</sup> <sup>U</sup> *(Theorem 5) coincides with the* <sup>R</sup>*-graded linear exponential comonad* D*.*

The following classic result [1, Theorem 6-1] can be reproved by Theorem 7.

**Corollary 1.** *Let* C *be a symmetric monoidal category and Let* D *be a nongraded linear exponential comonad on* C*. The canonical symmetric monoidal structure on the category* C<sup>D</sup> *of Eilenberg-Moore coalgebras of* D *is cartesian.*

*Proof.* From Theorem 1, D is a 1-graded linear exponential comonad on C. Therefore C(C, D) has a 1-twist by Theorem 7-3. Therefore the symmetric monoidal structure of C(C, D) is cartesian by Theorem 2. Finally, C(C, D) is strong monoidally isomorphic to C<sup>D</sup> by Theorem 6, hence the symmetric monoidal structure of <sup>C</sup><sup>D</sup> is also cartesian.

We show the finality of the category of graded comonoid-coalgebras. Let R be a partially ordered semiring and D be an R-graded linear exponential comonad on a symmetric monoidal category C. We define a *resolution* of D to be a pair of a symmetric lax monoidal adjunction <sup>J</sup> <sup>K</sup> : <sup>E</sup> <sup>→</sup> <sup>C</sup> and an <sup>R</sup>-twist (S, wS, c<sup>S</sup>) on <sup>E</sup> such that the extension of <sup>S</sup> with <sup>J</sup> <sup>K</sup> is equal to <sup>D</sup>. Then the following set of data becomes a strong monoidal functor (M,mM, m<sup>M</sup> E,E-) : <sup>E</sup> <sup>→</sup> <sup>C</sup>(C, D):

$$\begin{split} ME &= (\lambda r \, . \, J(Sr)E, \, . \, \lambda r, r' \, . \, J(Sr) \eta\_{Sr'E}^{J+K}, \, \, (m^J)^{-1} \diamond w\_E^S, \, \, \lambda r, r' \, . \, \left(m\_{Sr \, E, Sr'E}^J\right)^{-1} \diamond Jc\_{r, r', E}^S\right) \\ & (Mf)\_r = J(Sr)f, \quad \left(m^M\right)\_r = J(m\_r^S) \diamond m^J, \quad \left(m\_{E, E'}^M\right)\_r = J(m\_{r, E, E'}^S) \diamond m\_{Sr \, E, Sr \, E'}^J \end{split}$$

(recall that Sr, J are both symmetric strong monoidal).

**Theorem 8.** *The above* M *is the unique symmetric strong monoidal functor such that:*


$$\begin{array}{c} R^+ \xrightarrow[]{} \xrightarrow{S} \xrightarrow{S} \xrightarrow{} [\mathbb{E}, \mathbb{E}]\_l \\\\ [C(\mathbb{C}, D), C(\mathbb{C}, D)]\_l \xrightarrow[M^\* \xrightarrow{}] \mathbb{E}, C(\mathbb{C}, D)]\_l \end{array}$$

### **8 Conclusion**

We have given a concise characterization of graded linear exponential comonad as a vertical monoid homomorphism (D, , δ) from (R+, <sup>1</sup>, <sup>∗</sup>) to ([C, <sup>C</sup>]l,Id, ◦). This characterization is built upon a combination of the theory of symmetric lax monoidal multifunctors and Grandis and Par´e's double category of symmetric monoidal categories. After this characterization, we considered *monoid actions*, and derived the concept of graded comonoid-coalgebras. The category of graded comonoid-coalgebras are shown to give a resolution of the graded linear exponential comonad D. These results are consistent with the theory of non-graded linear exponential comonads developed in [1].

It remains to be seen if the category of graded comonoid-coalgebras can be constructed in a purely double-category theoretic way. In non-graded case, there are other type of categorical models of exponential modality using *Lafont category* and *Seely category* [17]. Graded version of these categories are also an interesting research topic.

**Acknowledgment.** The author is grateful to Marco Gaboardi, Naohiko Hoshino, Flavien Breuvart, Soichiro Fujii and Paul-Andr`e Melli`es for many fruitful discussions. This research was supported by JSPS KAKENHI Grant Number JP15K00014 and ERATO Hasuo Metamathematics for Systems Design Project (No. JPMJER1603), JST.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Depending on Session-Typed Processes**

Bernardo Toninho1,2(B) and Nobuko Yoshida<sup>2</sup>

<sup>1</sup> NOVA-LINCS, Departamento de Inform´atica, FCT, Universidade Nova de Lisboa, Lisbon, Portugal <sup>2</sup> Imperial College London, London, UK b.toninho@imperial.ac.uk

**Abstract.** This work proposes a dependent type theory that combines functions and session-typed processes (with value dependencies) through a contextual monad, internalising typed processes in a dependently-typed λ-calculus. The proposed framework, by allowing session processes to depend on functions and vice-versa, enables us to specify and statically verify protocols where the choice of the next communication action can depend on specific values of received data. Moreover, the type theoretic nature of the framework endows us with the ability to internally describe and prove predicates on process behaviours. Our main results are type soundness of the framework, and a faithful embedding of the functional layer of the calculus within the session-typed layer, showcasing the expressiveness of dependent session types.

### **1 Introduction**

Session types [14,24] are a typing discipline for communication protocols, whose simplicity provides an extensible framework that allows for integration with a variety of functional type features. One useful instance arising from the proof theoretic exploration of logical quantification is *value dependent session types* [25]. In this work, one can express properties of exchanged data in protocol specifications separately from communication, but *cannot* describe protocols where communication actions depend on the actual exchanged data (e.g. [16, Sect. 2]). Moreover, it does not allow functions or values to depend on protocols (i.e. sessions) or communication, thus preventing reasoning about dependent process behaviours, exploring the proofs-as-programs paradigm of dependent type theory, e.g. [8,17].

Our work addresses the limitations of existing formulations of session types by proposing a type theory that integrates dependent functions *and* session types using a *contextual monad*. This monad internalises a session-typed calculus within a dependently-typed λ-calculus. By allowing session types to depend on λ-terms *and* λ-terms to depend on typed processes (using the monad), we are able to achieve heightened degrees of expressiveness. Exploiting the former direction, we enable writing actual data-dependent communication protocols. Exploiting the latter, we can define and *prove* properties of linearly-typed objects (i.e. processes) within our intuitionistic theory.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 128–145, 2018. https://doi.org/10.1007/978-3-319-89366-2\_7

To informally demonstrate how our type theory goes beyond the state of the art in order to represent data-dependent protocols, consider the following session type (we write <sup>τ</sup> <sup>∧</sup><sup>A</sup> for <sup>∃</sup>x:τ.A where <sup>x</sup> does not occur in <sup>A</sup> and similarly <sup>τ</sup> <sup>⊃</sup> <sup>A</sup> for <sup>∀</sup>x:τ.A when <sup>x</sup> is not free in <sup>A</sup>), <sup>T</sup> - Bool ⊃ ⊕{<sup>t</sup> : Nat <sup>∧</sup> **<sup>1</sup>**, <sup>f</sup> : Bool <sup>∧</sup> **<sup>1</sup>**}, representable in existing session typing systems. The type T denotes a protocol which first, inputs a boolean and then either emits the label t, which will be followed by an output of a natural number; or emits the label f and a boolean. The intended protocol described by T is to take the t branch if the received value is t and the f branch otherwise, which we can implement as Q with channel z typed by T as follows:

$$Q \triangleq z(x). \mathsf{case} \ x \ \mathsf{of} \ (\mathsf{true} \Rightarrow z. \mathsf{t}; z \langle 23 \rangle. \mathsf{0}, \ \mathsf{false} \Rightarrow z. \mathsf{f}; z \langle \mathsf{true} \rangle. \mathsf{0})$$

where z(x).P denotes an input process, z.t is a process which selects label t and <sup>z</sup><sup>23</sup> .P is an output on <sup>z</sup>. However, since the specification is imprecise, process <sup>z</sup>(x).case <sup>x</sup> of (false <sup>⇒</sup> z.t; <sup>z</sup><sup>23</sup> .**0**, true <sup>⇒</sup> z.f; <sup>z</sup>true .**0**) is also a typecorrect implementation of T that does not adhere to the intended protocol. Using our dependent type system, we can narrow the specification to guarantee that the desired protocol is precisely enforced. Consider the following definition of a session-type level conditional where we assume inductive definition and dependent pattern matching mechanisms (stype denotes the *kind* of session types):

$$\begin{array}{rcl} \mathtt{if} :: \mathtt{Bool} \to \mathtt{stype} \to \mathtt{stype} \\ \mathtt{if} \ \mathtt{true} \ AB \ = \begin{array}{rcl} A & \mathtt{if} \ \mathtt{false} \ A \ B \ = \end{array} \end{array}$$

The type-level function above case analyses the boolean and produces its first session type argument if the value is true and the second otherwise. We may now specify a session type that faithfully implements the protocol:

$$T' \triangleq \forall x \text{:} \mathbf{Bool}. \mathbf{if} \ x \left( \mathsf{Mat} \land \mathbf{1} \right) \left( \mathsf{Bool} \land \mathbf{1} \right)$$

A process R implementing such a type on channel z is given below:

$$R \triangleq z(x). \mathsf{case}\ x \text{ of } (\mathsf{true} \Rightarrow z \langle 23 \rangle. \mathsf{0}, \ \mathsf{false} \Rightarrow z \langle \mathsf{true} \rangle. \mathsf{0})$$

Note that if we flip the two branches of the case analysis in R, the session is no longer typable with T , ensuring that the protocol is implemented faithfully.

The example above illustrates a simple yet useful data-dependent protocol. When we further extend our dependent types with a *process* monad [29], where {<sup>c</sup> <sup>←</sup> <sup>P</sup> <sup>←</sup> <sup>u</sup>*<sup>j</sup>* ; <sup>d</sup>*i*} is a functional term denoting a process that may be *spawned* by other processes by instantiating the names in u*<sup>j</sup>* and d*i*, we can provide more powerful reasoning on processes, enabling refined specifications through the use of type indices (i.e. type families) and an ability to internally specify and verify predicates on process behaviours. We also show that *all* functional types and terms can be faithfully embedded in the process layer using the dependentlytyped sessions and process monads.

**Contributions.** Section 2 introduces our dependent type theory, augmenting the example above by showing how we can reason about process behaviour using


**Fig. 1.** Syntax of kinds, types, terms and processes

type families and dependently-typed functions (Sect. 2.3). We then establish the soundness of the theory (Sect. 2.4). Section 3 develops a faithful embedding of the dependent function space in the process layer (Theorem 3.4). Section 4 concludes with related work. Proofs, omitted definitions and additional examples can be found in [32].

### **2 A Dependent Type Theory of Processes**

This section introduces our dependent type theory combining session-typed processes and functions. The theory is a generalisation of the line of work relating linear logic and session types [4,25,29], considering type-level functions and dependent kinds in an intensional type theory with full *mutual* dependencies between functions and processes. This generalisation enables us to express more sophisticated session types (such as those of Sect. 1) and also to define and *prove* properties of processes expressed as type families with proofs as their inhabitants. We focus on the new rules and judgements, pointing the interested reader to [5,25,26] for additional details on the base theory.

#### **2.1 Syntax**

The calculus is stratified into two mutually dependent layers of processes and terms, which we often refer to as the *process* and *functional* layers, respectively. The syntax of the theory is given in Fig. 1 (we use x, y for variables ranging over terms and t for variables ranging over types).

**Types and Kinds.** The process layer is able to refer to terms of the functional layer via appropriate (dependently-typed) communication actions and through a *spawn* construct, allowing for processes encapsulated as functional values to be executed. Dually, the functional layer can refer to the process layer via a *contextual* monad [29] that internalises (open) typed processes as opaque functional values. This mutual dependency is also explicit in the type structure on several axes: process channel usages are typed by a language of session types, which specifies the communication protocols implemented on the used channels, extended with two dependent communication operations <sup>∀</sup>x:τ.A and <sup>∃</sup>x:τ.A, where <sup>τ</sup> is a functional type and A is a session type in which x may occur. Moreover, we also extend the language of session types with type-level λ-abstraction over terms λx:τ.A and session types λt:: K.A (with the corresponding elimination forms A M and A B). As we show in Sect. 1, the combination of these features allows for a new degree of expressiveness, enabling us to construct session types whose structure depends on previously communicated values.

The remaining session constructs are standard, following [5]: !A denotes a *shared* session of type A that may be used an arbitrary (finite) number of times; A B represents a session offering to input a session of type A to then offer the session behaviour <sup>B</sup>; <sup>A</sup> <sup>⊗</sup> <sup>B</sup> is the dual operator, denoting a session that outputs <sup>A</sup> and proceeds as <sup>B</sup>; ⊕{l*<sup>i</sup>* : <sup>A</sup>*i*} and -{l*<sup>i</sup>* : <sup>A</sup>*i*} represent internal and external labelled choice, respectively; **1** denotes the terminated session.

The functional layer is a λ-calculus with dependent functions Πx:τ.σ, typelevel λ-abstractions over terms and types (and respective type-level applications) and a *contextual monadic* type {u*<sup>j</sup>* :B*<sup>j</sup>* ; <sup>d</sup>*i*:A*<sup>i</sup>* <sup>c</sup>:A}, denoting a (quoted) process offering session *c*:*A* by using the *linear* sessions d*i*:A*<sup>i</sup>* and *shared* sessions <sup>u</sup>*<sup>j</sup>* :B*<sup>j</sup>* [29]. We often write {A} for {·; · <sup>c</sup>:A}. The kinding system for our theory contains two base kinds type and stype of functional and session types, respectively. Type-level λ-abstractions require dependent kinds Πx:τ.K and Πt:: K.K , respectively. We note that the functional connectives form a standard dependent type theory [11,21].

**Terms and Processes.** Terms include the standard λ-abstractions λx:τ.M, applications M N and variables x. In order to internalise processes within the functional layer we make use of a monadic process wrapper, written {<sup>c</sup> <sup>←</sup> <sup>P</sup> <sup>←</sup> <sup>u</sup>*<sup>j</sup>* ; <sup>d</sup>*i*}. In such a construct, the channels <sup>c</sup>, <sup>u</sup>*<sup>j</sup>* and <sup>d</sup>*<sup>i</sup>* are bound in <sup>P</sup>, where <sup>c</sup> is the session channel being offered and u*<sup>j</sup>* and d*<sup>i</sup>* are the session channels (linear and shared, respectively) being used. We write {<sup>c</sup> <sup>←</sup> <sup>P</sup> <sup>←</sup> } when <sup>P</sup> does not use any ambient channels, which we abbreviate to {P}.

The syntax of processes follows that of [5] extended with the monadic elimination form <sup>c</sup> <sup>←</sup> <sup>M</sup> <sup>←</sup> <sup>u</sup>*<sup>j</sup>* ; <sup>d</sup>*i*; <sup>Q</sup>. Such a process construct denotes a term <sup>M</sup> that is to be evaluated to a monadic value of the form {<sup>c</sup> <sup>←</sup> <sup>P</sup> <sup>←</sup> <sup>u</sup>*<sup>j</sup>* ; <sup>d</sup>*i*} which will then be executed in parallel with Q, sharing with it a session channel c and using the provided channels <sup>u</sup>*<sup>j</sup>* and <sup>d</sup>*i*. We write <sup>c</sup> <sup>←</sup> <sup>M</sup> <sup>←</sup> ; <sup>Q</sup> when no channels are provided for the execution of <sup>M</sup> and often abbreviate this to <sup>c</sup> <sup>←</sup> <sup>M</sup>; <sup>Q</sup>. The process <sup>c</sup><sup>d</sup> .P denotes the output of the *fresh* channel <sup>d</sup> along channel <sup>c</sup> with continuation P, which binds d; (*ν*c)P denotes channel hiding, restricting the scope of <sup>c</sup> to <sup>P</sup>; <sup>c</sup>(x).P denotes an input along <sup>c</sup>, bound to <sup>x</sup> in <sup>P</sup>; <sup>c</sup><sup>M</sup> .P denotes the output of term M along c with continuation P; !c(x).P denotes a replicated input which spawns copies of <sup>P</sup>; the construct c.case{l*<sup>i</sup>* <sup>⇒</sup> <sup>P</sup>*i*} codifies a process that waits to receive some label l*<sup>j</sup>* along c, with continuation P*<sup>j</sup>* ; dually, c.l; P denotes a process that emits a label l along c and continues as P; [<sup>c</sup> <sup>↔</sup> <sup>d</sup>] denotes a forwarder between <sup>c</sup> and <sup>d</sup>, which is operationally implemented as renaming; <sup>P</sup> <sup>|</sup> <sup>Q</sup> denotes parallel composition and **<sup>0</sup>** the null process.

#### **2.2 A Dependent Typing System**

We now introduce our typing system, defined by a series of mutually inductive judgements, given in Fig. 2. We use Ψ to stand for a typing context for dependent


**Fig. 2.** Typing judgements

λ-terms (i.e. assumptions of the form x:τ or t:: K, not subject to exchange), Γ for a typing context for *shared* sessions of the form *u*:*A* (implicitly subject to weakening and contraction) and Δ for a linear context of sessions *x*:*A*. The context well-formedness judgments <sup>Ψ</sup> and <sup>Ψ</sup>; <sup>Δ</sup> require that types and kinds (resp. session types) in <sup>Ψ</sup> (resp. <sup>Δ</sup>) are well-formed. The judgments <sup>Ψ</sup> <sup>K</sup>, <sup>Ψ</sup> <sup>τ</sup> :: <sup>K</sup> and <sup>Ψ</sup> <sup>A</sup> :: <sup>K</sup> codify well-formedness of kinds, functional and session types (with kind K), respectively. Their rules are standard.

**Typing.** An excerpt of the typing rules for terms and processes is given in Figs. 3 and 4, respectively, noting that typing enforces types to be of base kind type (respectively stype). The rules for dependent functions are standard, including the type conversion rule which internalises definitional equality of types. We highlight the introduction rule for the monadic construct, which requires the appropriate session types to be well-formed and the process P to offer *c*:*A* when provided with the appropriate session contexts.

In the typing rules for processes (Fig. 4), presented as a set of right and left rules (the former identifying how to *offer* a session of a given type and the latter how to use such a session), we highlight the rules for dependently-typed communication and monadic elimination (for type-checking purposes we annotate constructs with the respective dependent type – this is akin to functional type theories). To offer a session <sup>c</sup>:∃x:τ.A we send a term <sup>M</sup> of type <sup>τ</sup> and then offer a session <sup>c</sup>:A{M/x}; dually, to use such a session we perform an input along <sup>c</sup>, bound to x in Q, warranting a use of c as a session of (open) type A. The rules for the universal are dual. Offering a session <sup>c</sup>:∀x:τ.A entails receiving on <sup>c</sup> <sup>a</sup> term of type τ and offering *c*:*A*. Using a session of such a type requires sending along <sup>c</sup> a term <sup>M</sup> of type <sup>τ</sup> , warranting the use of <sup>c</sup> as a session of type <sup>A</sup>{M/x}.

The rule for the monadic elimination form requires that the term M be of the appropriate monadic type and that the provided channels u*<sup>j</sup>* and y*<sup>i</sup>* adhere to the typing specified in M's type. Under these conditions, the process Q may then use the session c as session A. The type conversion rules reflect session type definitional equality in typing.

$$\begin{array}{c} (III) \\ \Psi \vdash \tau :: \text{type} \quad \Psi, x:\tau \vdash M:\sigma \\ \hline \Psi \vdash \lambda x.\tau.M:\Pi x.\tau.\sigma \\\\ (\{\}I) \\ \hline \Psi \vdash \{c \vdash P \mathrel{\scalebox{1.0.5}{0.5pt}{\$\forall\$\forall\$\forall\$\exists\$\forall\$\exists\$\forall\$\exists\$\forall\$\exists\$\forall\$\exists\$\forall\$\exists\$\forall\$\exists\$\{N/x\}}} \end{array} \quad \begin{array}{c} (IIE) \\ \Psi \vdash M:\Pi x.\tau.\sigma \\ \hline \Psi \vdash M:\Pi x.\tau.\sigma \\ \hline \Psi \vdash M.N:\sigma \{N/x\} \\ \hline \end{array}$$

$$\textbf{Fig. 3. Typing for terms (Except -- See [32])}$$

**Fig. 4.** Typing for processes (Excerpt – See [32])

**Definitional Equality.** The crux of any dependent type theory lies in its *definitional equality*. Type equality relies on equality of terms which, by including the monadic construct, necessarily relies on a notion of *process* equality.

Our presentation of an intensional definitional equality of terms follows that of [12], where we consider an intrinsically typed relation, including β and η conversion (similarly for type equality which includes β and η principles for the type-level λ-abstractions). An excerpt of the rules for term equality is given in Fig. 5. The remaining rules are congruence rules and closure under symmetry, reflexivity and transitivity. Rule (TMEqβ) captures the β-reduction, identifying a λ-abstraction applied to an argument with the substitution of the argument in the function body (typed with the appropriately substituted type). We highlight rule (TMEq{}η), which codifies a general <sup>η</sup>-like principle for arbitrary terms of monadic type: We form a monadic term that applies the monadic elimination form to M, forwarding the result along the appropriate channel, which becomes a term equivalent to M.

**Fig. 5.** Definitional equality of terms (Excerpt – See [32])

Definitional equality of processes is summarised in Fig. 6. We rely on process reduction defined below. Definitional equality of processes consists of the usual congruence rules, (typed) reductions and the commutting conversions of linear logic and η-like principles, which allows for forwarding actions to be equated with the primitive syntactic forwarding construct. Commutting conversions amount to sound observational equivalences between processes [22], given that session composition requires name restriction (embodied by the (cut) rule): In rule (PEqCC∀), either process can only be interacted with via channel <sup>c</sup> and so postponing actions of P to after the input on c (when reading the equality from left to right) cannot impact the process' observable behaviours. While P can in general interact with sessions in Δ (or with Q), these interactions are unobservable due to hiding in the (cut) rule.

**Operational Semantics.** The operational semantics for the λ-calculus is standard, noting that no reduction can take place inside monadic terms. The operational (reduction) semantics for processes is presented below where we omit closure under structural congruence and the standard congruence rules [4,25,29]. The last rule defines spawning a process in a monadic term.

$$\begin{array}{llll} c \langle M \rangle. P \mid c(x). Q \to P \mid Q \{ M/x \} & \overline{c} \langle x \rangle. P \mid c(x). Q \to (\nu x)(P \mid Q) \\ \langle c(x). P \mid \overline{c}(x). Q \to ! c(x). P \mid (\nu x)(P \mid Q) & c. \mathtt{case} \{ \overline{l\_i \Rightarrow P\_i \} \mid c. l\_j; Q \to P\_j \mid Q \: (l\_j \in \overline{l\_i}) \\ (\nu c)(P \mid [c \leftrightarrow d]) \to P \{ d/c \} & c \leftarrow \{ c \leftarrow P \leftarrow \overline{u\_j}; \overline{d\_i} \} \leftarrow \overline{u\_j}; \overline{d\_i}; Q \to (\nu c)(P \mid Q) \end{array}$$

#### **2.3 Example – Reasoning About Processes Using Dependent Types**

The use of type indices (i.e. type families) in dependently typed frameworks adds information to types to produce more refined specifications. Our framework enables us to do this at the level of session types.

Consider a session type that "counts down" on a natural number (we assume inductive definitions and dependent pattern matching in the style of [21]):

> countDown :: Πx:Nat.stype countDown (succ(n)) = <sup>∃</sup>y:Nat.countDown(n) countDown z = **1**

The type family countDown(n) denotes a session type that emits exactly n numbers and then terminates. We can now write a (dependently-typed) function that produces processes with the appropriate type, given a starting value:

$$\begin{array}{l} \mathsf{counter} & \mathsf{ \cdot IF:\mathsf{Nat}.\{\mathsf{countDown}(x)\}}\\ \mathsf{counter} \ (\mathsf{succ}(n)) = \{c \leftarrow c \langle \mathsf{succ}(n) \rangle . d \leftarrow \mathsf{counter}(n); [d \leftrightarrow c] \} \\ \mathsf{counter} \ \mathsf{z} & = \{c \leftarrow \mathsf{0} \} \end{array}$$

Note how the type of counter, through the type family countDown, allows us to specify exactly the number of times a value is sent. This is in sharp contrast with existing recursive (or inductive/coinductive [18,30]) session types, where one may only specify the general iterative nature of the behaviour (e.g. "send a number and then recurse or terminate").

The example above relies on session type indexing in order to provide additional static guarantees about processes (and the functions that generate them). An alternative way is to consider "simply-typed" programs and then *prove* that they satisfy the desired properties, using the language itself. Consider a simplytyped version of the counter above described as an inductive session type:

> simpleCounterT :: stype simpleCounterT <sup>=</sup> ⊕{dec : Nat <sup>∧</sup> simpleCounterT, done : **<sup>1</sup>**}

There are many processes that correctly implement such a type, given that the type merely dictates that the session outputs a natural number and recurses (modulo the dec and done messages to signal which branch of the internal choice is taken). A function that produces processes implementing such a session, mirroring those generated by the counter function above, is:


The process generated by simpleCounter, after emiting the dec label, spawns a process in parallel that sends the appropriate number, which is received by the parallel thread and then sent along the session c. Despite its simplicity, this example embodies a general pattern where a computation is spawned in parallel (itself potentially spawning many other threads) and the main thread then waits for the result before proceeding.

While such a process is typable in most session typing frameworks, our theory enables us to *prove* that the counter implementation above indeed counts down from a given number by defining an appropriate (inductive) type family, indexed by *monadic* values (i.e. processes):

```
corrCount :: Πx:Nat.Πy:{simpleCounterT}.type
corrz : corrCount z {c ← c.done; 0}
corrn : Πn:Nat.ΠP:{simpleCounterT}.corrCount n P →
           corrCount(succ(n)) {c ← c.dec; csucc(n)	.d ← P; [d ↔ c]}
```
The type family corrCount, indexed by a natural number and a monadic value implementing the session type simpleCounter, is defined via two constructors: corr*z*, which specifies that a correct 0 counter emits the done label and terminates; and corr*n*, which given a monadic value P that is a correct n-counter, defines that a correct (n + 1)-counter emits n + 1 and then proceeds as P (modulo the label emission bookkeeping).

The proof of correctness of the simpleCounter function above is no more than a function of type Πn:Nat.corrCount n (simpleCounter(n)), defined below:

> prf : Πn:Nat.corrCount n (simpleCounter(n)) prf z = corr*<sup>z</sup>* prf (succ(n)) = corr*<sup>n</sup>* n (simpleCounter(n)) (prf n)

Note that in this scenario, the processes that index the corrCount type family are not syntactically equal to those generated by simpleCounter, but rather *definitionally* equal.

Typically, the processes that index such correctness specifications tend to be distilled versions of the actual implementations, which often perform some additional internal computation or communication steps. Since our notion of definitional equality of processes includes reduction (and also commuting conversions which account for type-preserving shuffling of internal communication actions [26]), the type conversion mechanism allows us to use the techniques described above to generally reason about specification conformance.

### **2.4 Type Soundness of the Framework**

The main goal of this section is to present type soundness of our framework through a subject reduction result. We also show that our theory guarantees progress for terms and processes. The development requires a series of auxiliary results (detailed in [32]) pertaining to the functional and process layers which are ultimately needed to produce the inversion properties necessary to establish subject reduction. We note that strong normalisation results for linear-logic based session processes are known in the literature [3,26,30], even in the presence of impredicative polymorphism, restricted corecursion and higher-order data. Such results are directly applicable to our work using appropriate semantics preserving type erasures.

In the remainder we often write <sup>Ψ</sup> <sup>J</sup> to stand for a well-formedness, typing or definitional equality judgment of the appropriate form. Similarly for <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>J</sup> . We begin with the substitution property, which naturally holds for both layers, noting that the dependently typed nature of the framework requires substitution in both contexts, terms and in types.

### **Lemma 2.1 (Substitution).** *Let* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *:*

*1. If* Ψ,x:τ,Ψ <sup>J</sup> *then* Ψ,Ψ {M/x}
J{M/x}*; 2. If* Ψ,x:τ,Ψ ; <sup>Γ</sup>; <sup>Δ</sup> <sup>J</sup> *then* Ψ,Ψ {M/x}; <sup>Γ</sup>{M/x}; <sup>Δ</sup>{M/x}
J{M/x}

Combining substitution with a form of functionality for typing (i.e. that substitution of equal terms in a well-typed term produces equal terms) and for equality (i.e. that substitution of equal terms in a definitional equality proof produces equal terms), we can establish validity for typing and equality, which is a form of internal soundness of the type theory stating that judgments are consistent across the different levels of the theory.

**Lemma 2.2 (Validity for Typing).** (1) *If* <sup>Ψ</sup> <sup>τ</sup> :: <sup>K</sup> *or* <sup>Ψ</sup> <sup>A</sup> :: <sup>K</sup> *then* <sup>Ψ</sup> <sup>K</sup>*;* (2) *If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then* <sup>Ψ</sup> <sup>τ</sup> ::type*; and* (3) *If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* <sup>Ψ</sup> <sup>A</sup> ::stype*.*

#### **Lemma 2.3 (Validity for Equality)**


With these results we establish the appropriate inversion and injectivity properties which then enable us to show unicity of types (and kinds).

#### **Theorem 2.4 (Unicity of Types and Kinds)**


*4. If* <sup>Ψ</sup> <sup>A</sup> :: <sup>K</sup> *and* <sup>Ψ</sup> <sup>A</sup> :: <sup>K</sup> *then* <sup>Ψ</sup> <sup>K</sup> <sup>=</sup> <sup>K</sup> *.*

All the results above, combined with the process-level properties established in [5,26,27] enable us to show the following:

**Theorem 2.5 (Subject Reduction – Terms).** *If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *and* <sup>M</sup> −→ <sup>M</sup> *then* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *.*

**Theorem 2.6 (Subject Reduction – Processes).** *If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* <sup>P</sup> −→ <sup>P</sup> *then* <sup>∃</sup><sup>Q</sup> *such that* <sup>P</sup> <sup>≡</sup> <sup>Q</sup> *and* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>Q</sup> :: <sup>z</sup>:A*.*

**Theorem 2.7 (Progress – Terms).** *If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then either* <sup>M</sup> *is a value or* <sup>M</sup> −→ <sup>M</sup> *.*

As common in logical-based session type theories, typing enforces a strong notion of *global* progress which states that closed processes that are waiting to perform communication actions cannot get stuck (this relies on a notion of *live* process, defined as live(P) iff <sup>P</sup> <sup>≡</sup> (*ν*n˜)(π.Q <sup>|</sup> <sup>R</sup>) for some process <sup>R</sup>, sequence of names ˜n and a non-replicated guarded process π.Q). We note that the restricted typing for P is without loss of generality, due to the (cut) rule.

**Theorem 2.8 (Progress – Processes).** *If* <sup>Ψ</sup>; ·; · <sup>P</sup> :: <sup>c</sup>:**<sup>1</sup>** *and* live(P) *then* <sup>∃</sup><sup>Q</sup> *such that* <sup>P</sup> −→ <sup>Q</sup>*.*

### **3 Embedding the Functional Layer in the Process Layer**

Having introduced our type theory and showcased some of its informal expressiveness in terms of the ability to specify and *statically* verify true data dependent protocols, as well as the ability to prove properties of processes, we now develop a formal expressiveness result for our theory, showing that the process level type constructs are able to encode the dependently-typed functional layer, faithfully preserving type dependencies.

Specifically, we show that (1) the type-level constructs in the functional layer can be represented by those in the process layer combined with the contextual monad type, and (2) all term level constructs can be represented by session-typed processes that exchange monadic values. Thus, we show that both λ-abstraction and application can be eliminated while still preserving non-trivial type dependencies. Crucially, we note that the monadic construct *cannot* be fully eliminated due to the cross-layer nature of session type dependencies: In the process layer, simply-kinded dependent types (i.e. types with kind stype) are of the form <sup>∀</sup>x:τ.A where <sup>τ</sup> is of kind type and <sup>A</sup> of kind stype (where <sup>x</sup> may occur). Operationally, such a session denotes an input of some term M of type τ with a continuation of type <sup>A</sup>{M/x}. Thus, to faithfully encode type dependencies we cannot represent such a type with a non-dependently typed input (e.g. a type of the form A B).

#### **3.1 The Embedding**

**A first attempt.** Given the observation above, a seemingly reasonable option would be to attempt an encoding that maintains monadic objects solely at the level of type indices and then exploits Girard's encoding [9] of function types <sup>τ</sup> <sup>→</sup> <sup>σ</sup> as !<sup>τ</sup> <sup>→</sup> σ, which is adequate for session-typed processes [28]. Thus a candidate encoding for the type Πx:τ.σ would be <sup>∀</sup>x:{<sup>τ</sup> }.!<sup>τ</sup> σ, where − denotes our encoding on types. If we then consider the encoding at the level of terms, typing dictates the following (we write M*<sup>z</sup>* for the process encoding of M : τ , where z is the session channel along which one may observe the "result" of the encoding, typed with τ ):

$$\begin{array}{lcl} \left[ \lambda x \colon \tau.M \right]\_z \triangleq z(x).z(x'). \left[ \left[ M \right]\_z \\ \left[ M \ N \right]\_z \stackrel{\scriptstyle \Delta}{=} (\nu x)(\left[ M \right]\_x \mid x \langle \{ \left[ N \right]\_y \} \rangle . \overline{x} \langle x' \rangle. \left( \left[ x'(y). \left[ \left[ N \right]\_y \mid \left[ x \leftrightarrow z \right] \right] \right) \end{array} \right)$$

However, this candidate encoding breaks down once we consider definitional equality. Specifically, compositionality (i.e. the relationship between M{N/x}*<sup>z</sup>* and the encoding of <sup>N</sup> substituted in that of <sup>M</sup>) requires us to relate M{N/x}*<sup>z</sup>* with (*ν*x)(M*z*{{N*y*}/x} | !x (y).N*y*), which relies on reasoning up-to *observational equivalence* of processes, a much stronger relation than our notion of definitional equality. Therefore it is *fundamentally* impossible for such an encoding to preserve our definitional equality, and thus it cannot preserve typing in the general case.

**A faithful embedding.** We now develop our embedding of the functional layer into the process layer which is compatible with definitional equality. Our target calculus is reminiscent of a higher-order (in the sense of higher-order processes [23]) session calculus [19]. Our encoding − is inductively defined on kinds, types, session types, terms and processes. As usual in process encodings of the λ-calculus, the encoding of a term M is indexed by a result channel z, written M*z*, where the behaviour of <sup>M</sup> may be observed.

The embedding is presented in Fig. 7, noting that the encoding extends straightforwardly to typing contexts, where functional contexts Ψ,x:τ are mapped to {Ψ}, x:{<sup>τ</sup> }. The mapping of base kinds is straightforward. Dependent kinds Πx:τ.K rely on the monad for well-formedness and are encoded as (session) kinds of the form Πx:{<sup>τ</sup> }.K. The higher-kinded types in the functional layer are translated to the corresponding type-level constructs of the process layer where all objects that must be type-kinded rely on the monad to satisfy this constraint. For instance, λx:τ.σ is mapped to the session-type abstraction λx:{<sup>τ</sup> }.σ and the type-level application τ M is translated to <sup>τ</sup> {M*c*}. Given the observation above on embedding the dependent function type Πx:τ.σ, we translate it directly to <sup>∀</sup>x:{<sup>τ</sup> }.σ, that is, functions from <sup>τ</sup> to <sup>σ</sup> are mapped to sessions that input *processes* implementing τ and then behave as σ accordingly. The encoding for monadic types simply realises the contextual nature of the monad by performing a sequence of inputs of the appropriate types (with the shared sessions being of ! type).

The mutually dependent nature of the framework requires us to extend the mapping to the process layer. Session types are mapped homomorphically (e.g. A B - A B) with the exception of dependent inputs and outputs which rely on the monad, similarly for type-level functions and application.

The encoding of λ-terms is guided by the embedding for types: the abstraction λx:τ.M is mapped to an input of a term of type {<sup>τ</sup> } with continuation M*z*; application M N is mapped to the composition of the encoding of <sup>M</sup> on a fresh name <sup>x</sup> with the corresponding output of {N*y*}, which is then forwarded to the result channel z; monadic expressions are translated to the appropriate sequence of inputs, as dictated by the translation of the monadic type; and, the translation of variables makes use of the monadic elimination form (since the encoding enforces variables to always be of monadic type) combined with forwarding to the appropriate result channel.

The mapping for processes is mostly homomorphic, using the monad constructor as needed. The only significant exception is the encoding for monadic elimination which must provide the encoded monadic term M*<sup>c</sup>* with the necessary channels. Since the session calculus does not support communication of free names this is achieved by a sequence of outputs of fresh names combined with forwarding of the appropriate channel. To account for replicated sessions we must first trigger the replication via an output which is then forwarded accordingly.

We can illustrate our encoding via a simple example of an encoded function (we omit type annotations for conciseness):

$$\begin{array}{l} \left[ (\lambda x.x) (\lambda x.\lambda y.y) \right]\_z = (\nu c) (\left[ \lambda x.x \right]\_c \mid c \langle \{ \lambda x.\lambda y.y \}\_w \rangle \rangle. [c \mapsto z]) \\ = (\nu c) (c(x).y \gets x; [y \gets c] \mid c \langle \{ w(x).w(y).d \gets y; [d \gets w] \} \rangle. [c \gets z]) \\ \to^+ z(x).z(y).d \gets y; [d \gets z] \quad [\lambda x.\lambda y.y]\_z \end{array}$$

#### **3.2 Properties of the Embedding**

We now state the key properties satisfied by our embedding, ultimately resulting in type preservation and operational correspondence. For conciseness, in the statements below we list only the cases for terms and processes, omitting those for types and kinds (see [32]). The key property that is needed is a notion of compositionality, which unlike in the sketch above no longer falls outside of definitional equality.

#### **Lemma 3.1 (Compositionality)**

*1.* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> M{N/x}*<sup>z</sup>* <sup>=</sup> M*z*{{N*y*}/x} :: <sup>z</sup>:A{N/x} *2.* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> P{M/x} :: <sup>z</sup>:A{M/x} *iff* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> P{{M*c*}/x} :: <sup>z</sup>:A {{M*c*}/x}*.*

Given the dependently typed nature of the framework, establishing the key properties of the encoding must be done simultaneously (relying on some auxiliary results – see [32]).

#### **Theorem 3.2 (Preservation of Equality)**

*1. If* <sup>Ψ</sup> <sup>M</sup> <sup>=</sup> <sup>N</sup> : <sup>τ</sup> *then* {Ψ}; ·; · M*<sup>z</sup>* <sup>=</sup> N*<sup>z</sup>* :: <sup>z</sup>:<sup>τ</sup> *2. If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> <sup>=</sup> <sup>Q</sup> :: <sup>z</sup>:<sup>A</sup> *then* {Ψ}; Γ; Δ P <sup>=</sup> Q :: <sup>z</sup>:A*.*

#### **Theorem 3.3 (Preservation of Typing)**

*1. If* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then* {Ψ}; ·; · M*<sup>z</sup>* :: <sup>z</sup>:<sup>τ</sup> *2. If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *then* {Ψ}; Γ; Δ P :: <sup>z</sup>:A*.*

**Theorem 3.4 (Operational Correspondence).** *If* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>P</sup> :: <sup>z</sup>:<sup>A</sup> *and* <sup>Ψ</sup> <sup>M</sup> : <sup>τ</sup> *then:*


In Theorem 3.4, (a) is commonly referred to as operational completeness, with (b) establishing soundness. As exemplified above, our encoding satisfies a very precise operational correspondence with the original λ-terms.

### **4 Related and Future Work**

**Enriching Session Types via Type Structure.** Exploiting the linear logical foundations of session types, [25] considers a form of value dependencies where session types can state properties of exchanged data values, while the work [29] introduces the contextual monad in a simply-typed setting. Our development not only subsumes these two works, but goes beyond simple value dependencies by extending to a richer type structure and integrating dependencies with the contextual monad. Recently, [1] considers a non-conservative extension of linear logic-based session types with sharing, allowing true non-determinism. Their work includes dependent quantifications with shared channels, but their type syntax does *not* include free type variables, so the actual type dependencies do not arise (see [1, 37:8]). Thus none of the examples in this paper can be represented in [1]. The work [16] studies gradual session types. To the best of our knowledge, the main example in [1, Sect. 2] is *statically* representable in our framework as in the example of Sect. 1, where protocol actions depend on values that are communicated (or passed as function arguments).

In the context of multiparty session types, the theory of multiparty indexed session types is studied in [7], and implemented in a protocol description language [20]. The main aim of these works is to use indexed types to represent an arbitrary number of session *participants*. The work [31] extends [25] to multiparty sessions in order to treat value dependency across multiple participants. Extending our framework to multiparty [15] or non-logic based session types [14] is an interesting future topic.

**Combining Linear and Dependent Types.** Many works have studied the various challenges of integrating linearity in dependent functional type theories. We focus on the most closely related works. The work [6] introduced the Linear Logical Framework (LLF), integrating linearity with the LF [11] type theory, which was later extended to the Concurrent Logical Framework (CLF) [33], accounting for further linear connectives. Their theory is representable in our framework through the contextual monad (encompassing full intuitionistic linear logic), depending on linearly-typed processes that can express dependently typed functions (Sect. 3).

The work of [17] integrates linearity with type dependencies by extending LNL [2]. Their work is aimed at reasoning about imperative programs using a form of Hoare triples, requiring features that we do not study in this work such has proof irrelevance and computationally irrelevant quantification. Formally, their type theory is extensional which introduces significant technical differences from our intensional type theory, such as a realisability model in the style of NuPRL [10] to establish consistency.

Recently, [8] proposed an extension of LLF with first-class contexts (which may contain both linear and unrestricted hypotheses). While the contextual aspects of their theory are reminiscent of our contextual monad, their framework differs significantly from ours, since it is designed to enable higher-order abstract syntax (commonplace in the LF family of type theories), focusing on a type system for canonical LF objects with a meta-language that includes contexts and context manipulation. They do not consider additives since their integration with first-class contexts can break canonicity.

While none of the above works considers processes as primitive, their techniques should be useful for, e.g. developing algorithmic type-checking and integrating inductive and coinductive session types based on [18,26,30].

**Dependent Types and Higher-Order** π**-calculus.** The work [35] studies a form of dependent types where the type of processes takes the form of a mapping Δ from channels x to channel types T representing an interface of process P. The dependency is specified as Π(x:T)Δ, representing a channel abstraction of the environment. This notion is extended to an existential channel dependency type Σ(x:T)Δ to address fresh name creation [13,34]. Combining our process monad with dependent types can be regarded as an "interface" which describes explicit channel usages for processes. The main differences are (1) our dependent types are more general, treating full dependent families including terms and processes in types, while [13,34,35] study only channel dependency to environments (i.e. neither terms nor processes appear in types, only channels); and (2) our calculus emits only fresh names, not needing to handle the complex scoping mechanism treated in [13,34]. In this sense, the process monad provides an elegant framework to handle higher-order computations and assign non-trivial types to processes.

**Acknowledgements.** The authors would like to thank the anonymous reviews for their comments and suggestions. This work is partially supported by EPSRC EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1, EP/N028201/1 and NOVA LINCS (UID/CEC/04516/2013).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **FabULous Interoperability for ML and a Linear Language**

Gabriel Scherer1,2(B) , Max New<sup>1</sup>, Nick Rioux<sup>1</sup>, and Amal Ahmed1,3

<sup>1</sup> Northeastern University, Boston, USA maxnew@ccs.neu.edu, rioux.n@husky.neu.edu, A.Ahmed@northeastern.edu <sup>2</sup> Inria Saclay, Palaiseau, France gabriel.scherer@inria.fr <sup>3</sup> Inria Paris, Paris, France

**Abstract.** Instead of a monolithic programming language trying to cover all features of interest, some programming systems are designed by combining together simpler languages that cooperate to cover the same feature space. This can improve usability by making each part simpler than the whole, but there is a risk of *abstraction leaks* from one language to another that would break expectations of the users familiar with only one or some of the involved languages.

We propose a formal specification for what it means for a given language in a multi-language system to be usable without leaks: it should embed into the multi-language in a *fully abstract* way, that is, its contextual equivalence should be unchanged in the larger system.

To demonstrate our proposed design principle and formal specification criterion, we design a multi-language programming system that combines an ML-like statically typed functional language and another language with linear types and linear state. Our goal is to cover a good part of the expressiveness of languages that mix functional programming and linear state (ownership), at only a fraction of the complexity. We prove that the embedding of ML into the multi-language system is fully abstract: functional programmers should not fear abstraction leaks. We show examples of combined programs demonstrating in-place memory updates and safe resource handling, and an implementation extending OCaml with our linear language.

### **1 Introduction**

Feature accretion is a common trend among mature but actively evolving programming languages, including C++, Haskell, Java, OCaml, Python, and Scala. Each new feature strives for generality and expressiveness, and may provide a large usability improvement to users of the particular problem domain or programming

**Note:** Due to severe space restrictions, many details have been omitted from this presentation of our work. We strongly encourage the reader to consult the complete version at https://arxiv.org/pdf/1707.04984.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 146–162, 2018. https://doi.org/10.1007/978-3-319-89366-2\_8

style it was designed to empower (e.g., XML documents, asynchronous communication, staged evaluation). But feature creep in general-purpose languages may also make it harder for programmers to master the language as a whole, degrade the user experience (e.g., leading to more cryptic error messages), require additional work on the part of tooling providers, and lead to fragility in language implementations.

A natural response to increased language complexity is to define subsets of the language designed for a better programming experience. For instance, a subset can be easier to teach (e.g., "Core" ML<sup>1</sup>, Haskell 98 as opposed to GHC Haskell, Scala mastery levels<sup>2</sup>); it can facilitate static analysis or decrease the risk of programming errors, while remaining sufficiently expressive for the target users' needs (e.g., MISRA C, Spark/Ada); it can enforce a common style within a company; or it can be designed to encourage a transition to deprecate some ill-behaved language features (e.g., strict Javascript).

Once a subset has been selected, it may be the case that users write whole programs purely in the subset (possibly using tooling to enforce that property), but programs will commonly rely on other libraries that are not themselves implemented in the same subset of the language. If users stay in the subset while using these libraries, they will only interact with the part of the library whose interface is expressible in the subset. But does the behavior of the library respect the expectations of users who only know the subset? When calling a function from within the subset breaks subset expectations, it is a sign of *leaky abstraction*.

How should we design languages with useful subsets that manage complexity and avoid abstraction leaks?

We propose to look at this question from a different, but equivalent, angle: instead of designing a single big monolithic language with some nicer subsets, we propose to consider *multi-language* programming systems where several smaller programming languages interact together to cover the same feature space. Each language or sub-combination of languages is a subset, in the above sense, of the multi-language, and there is a clear definition of *abstraction leaks* in terms of user experience: a user who only knows some of the languages of the system should be able to use the multi-language system, interacting with code written in the other languages, without have their expectations violated. If we write a program in Java and call a function that, internally, is implemented in Scala, there should be no surprises—our experience should be the same as when calling a pure Java function. Similarly, consider the subset of Haskell that does not contain IO (input-output as a type-tracked effect): the expectations of a user of this language, for instance in terms of valid equational reasoning, should not be violated by adding IO back to the language—in the absence of the abstraction-leaking unsafePerformIO.

We propose a *formal specification* for a "no abstraction leaks" guarantee that can be used as a design criterion to design new multi-language systems, with graceful interoperation properties. It is based on the formal notion of *full abstraction* which has previously been used to study the denotational semantics

<sup>1</sup> https://caml.inria.fr/pub/docs/u3-ocaml/ocaml-ml.html.

<sup>2</sup> http://www.scala-lang.org/old/node/8610.

of programming languages (Meyer and Sieber 1988; Milner 1977; Cartwright and Felleisen 1992; Jeffrey and Rathke 2005; Abramsky, Jagadeesan, and Malacaria 2000), and the formal property of compilers (Ahmed and Blume 2008, 2011; Devriese et al. 2016; New et al. 2016; Patrignani et al. 2015), but not for userfacing languages. A compiler C from a source language S to a target language T is *fully abstract* if, whenever two source terms s<sup>1</sup> and s<sup>2</sup> are indistinguishable in S, their translations C(s1) and C(s2) are indistinguishable in T. In a multilanguage G + E formed of a general-purpose, user-friendly language G and a more advanced language E—one that provides an *e*scape hatch for *e*xperts to write code that can't be implemented in G—we say that E does not *leak* into G if the embedding of G into the multi-language G + E is fully abstract.

To demonstrate that our formal specification is reasonable, we design a novel multi-language programming system that satisfies it. Our multi-language λUL combines a general-purpose functional programming language λ<sup>U</sup> (unrestricted) of the ML family with an advanced language λ<sup>L</sup> (linear) with *linear types* and linear state. It is less convient to program in λ<sup>L</sup>'s restrictive type system, but users can write programs in λ<sup>L</sup> that could not be written in λ<sup>U</sup>: they can use linear types, locally, to enforce resource usage protocols (typestate), and they can use linear state and the linear ownership discipline to write programs that do in-place update to allocate less memory, yet remain observationally pure.

Consider for example the following mixed-language program. The blue fragments are written in the general-purpose, user-friendly functional language, while the red fragments are written in the linear language. The boundaries UL and LU allow switching between languages. The program reads all lines from a file, accumulating them in a list, and concatenating it into a single string when the end-of-file (EOF) is reached.

```
let concat_lines path : String = UL(
loop (open LU(path)) LU(Nil)
where rec loop handle LU(acc : List String) =
  match line handle with
  | Next line LU(handle) -> loop handle LU(Cons line acc)
  | EOF handle -> close handle; LU(rev_concat "\n" acc))
```
The linear type system ensures that the file handle is properly closed: removing the close handle call would give a type error. On the other hand, only the parts concerned with the resource-handling logic need to be written in the red linear language; the user can keep all general-purpose logic (here, how to accumulate lines and what to do with them at the end) in the more convenient generalpurpose blue language—and call this function from a blue-language program. Fine-grained boundaries allow users to rely on each language's strength and to use the advanced features only when necessary.

In this example, the file-handle API specifies that the call to line, which reads a line, returns the data at type ![String]. The latter represents how U values of type String can be put into a *lump* type to be passed to the linear world where they are treated as opaque blackboxes that must be passed back to the ML world for consumption. For other examples, such as in-place list manipulation or transient operations on an persistent data structure, we will need a deeper form of interoperability where the linear world creates, dissects or manipulates U values. To enable this, our multi-language supports translation of types from one language to the other, using a *type compatibility* relation σ σ between λ<sup>U</sup> types σ and λ<sup>L</sup> types σ.

We claim the following contributions:


### **2 The** *λ***<sup>U</sup> and** *λ***<sup>L</sup> Languages**

The unrestricted language λ<sup>U</sup> is a run-of-the-mill idealized ML language with functions, pairs, sums, iso-recursive types and polymorphism. It is presented in its explicitly typed form—we will not discuss type inference in this work. The full syntax is described in Fig. 1, and the typing rules in Fig. 2. The dynamic semantics is completely standard. Having binary sums, binary products and isorecursive types lets us express algebraic datatypes in the usual way.

The novelty lies in the linear language λ<sup>L</sup>, which we present in several steps. As is common in λ-calculi with references, the small-step operational semantics is given for a language that is not exactly the surface language in which programs

**Fig. 1.** Unrestricted language: syntax

**Fig. 2.** Unrestricted language: static semantics

are written, because memory allocation returns *locations* that are not in the grammar of surface terms. Reductions are defined on *configurations*, a local store paired with a term in a slightly larger *internal* language. We have two type systems, a type system on surface terms, that does not mention locations and stores—which is the one a programmer needs to know—and a type system on configurations, which contains enough static information to reason about the dynamics of our language and prove subject reduction. Again, this follows the standard structure of syntactic soundness proofs for languages with a mutable store.

#### **2.1 The Core of** *λ***<sup>L</sup>**

Figure 3 presents the surface syntax of our linear language λ<sup>L</sup>. For the syntactic categories of types σ, and expressions e, the last line contains the constructions related to the linear store that we only discuss in Sect. 2.2.

In technical terms, our linear type system is exactly propositional intuitionistic linear logic, extended with iso-recursive types. For simplicity and because we did not need them, our current system also does not have polymorphism or additive/lazy pairs σ<sup>1</sup> & σ2. Additive pairs would be a trivial addition, but polymorphism would require more work when we define the multi-language semantics in Sect. 3.

In less technical terms, our type system can enforce that values be used *linearly*, meaning that they cannot be duplicated or erased, they have to be deconstructed

**Fig. 3.** Linear language: surface syntax

exactly once. Only some types have this linearity restriction; others allow duplication and sharing of values at will. We can think of linear values as *resources* to be spent wisely; for any linear value somewhere in a term, there can be only one way to access this value, so we can interpret the language as enforcing an *ownership* discipline where whoever points to a linear value owns it.

In particular, linear functions of type σ<sup>1</sup> σ<sup>2</sup> must be called exactly once, and their results must in turn be consumed – they can safely capture linear resources. On the other hand, the non-linear, duplicable values are those at types of the form !σ — the *exponential* modality of linear logic. If the term e has duplicable type !σ, then the term copy e has type σ: this creates a local copy of the value that is uniquely-owned by its receiver and must be consumed linearily.

This resource-usage discipline is enforced by the surface typing rules of λ<sup>L</sup>, presented in Fig. 4. They are exactly the standard (two-sided) logical rules of intuitionistic linear logic, annotated with program terms. The non-duplicability of linear values is enforced by the way contexts are merged by the inference rules: if e<sup>1</sup> is type-checked in the context Γ<sup>1</sup> and e<sup>2</sup> in Γ2, then the linear pair e1, <sup>e</sup>2 is only valid in the combined context <sup>Γ</sup><sup>1</sup> - Γ2. The (-) operation is partial; this combined context is defined only if the variables shared by Γ<sup>1</sup> and Γ<sup>2</sup> are duplicable—their type is of the form !σ. In other words, a variable at a non-duplicable type in Γ<sup>1</sup> - Γ<sup>2</sup> cannot possibly appear in both Γ<sup>1</sup> and Γ2: it must appear exactly once<sup>3</sup>.

The expression share e takes a term at some type σ and creates a "shared" term, whose value will be duplicable. Its typing rule uses a context of the form !Γ, which is defined as the pointwise application of the (!) connectives to all the types in Γ. In other words, the context of this rule must only have duplicable types: a term can only be made duplicable if it does not depend on linear resources from the context. Otherwise, duplicating the shared value could break the uniqueownership discipline on these linear resources.

Finally, the linear isomorphism notation for fold and unfold in Fig. 4 defines them as primitive functions, at the given linear function type, in the empty context – using them does not consume resources. This notation also means that, operationally, these two operations shall be inverses of each other. The rules for the linear store type Box 1 σ and Box 0 are described in Sect. 2.2.

<sup>3</sup> Standard presentations of linear logic force contexts to be completely distinct, but have a separate rule to duplicate linear variables, which is less natural for programming.

**Fig. 4.** Linear language: surface static semantics

#### **2.2 Linear Memory in** *λ***<sup>L</sup>**

The surface typing rules for the linear store are given at the end of Fig. 4. The linear type Box 1 σ represents a memory location that holds a value of type σ. The type Box 0 represents a location that has been allocated, but does not currently hold a value. The primitive operations to act on this type are given as linear isomorphisms: new allocates, turning a unit value into an empty location; conversely, free reclaims an empty location. Putting a value into the location and taking it out are expressed by box and unbox, which convert between a pair of an empty location and a value, of type (Box 0)<sup>⊗</sup> <sup>σ</sup>, and a full location, of type Box 1 σ.

For example, the following program takes a full reference and a value, and swaps the value with the content of the reference:

The programming style following from this presentation of linear memory is functional, or applicative, rather than imperative. Rather than insisting on the mutability of references—which is allowed by the linear discipline—we may think of the type Box 1σ as representing the indirection through the heap that is implicit in functional programs. In a sense, we are not writing imperative programs with a mutable store, but rather making explicit the allocations and dereferences happening in higher-level purely functional language. In this view, empty cells allow memory reuse.

This view that Box 1 σ represents indirection through the memory suggests we can encode lists of values of type σ by the type LinList σ def = μα.<sup>1</sup> <sup>⊕</sup> Box 1 (σ<sup>⊗</sup> <sup>α</sup>). The placement of the box inside the sum mirrors the fact that empty list is represented as an immediate value in functional languages. From this type definition, one can write an in-place reverse function on lists of σ as follows:

Our linear language λ<sup>L</sup> is a formal language that is not terribly convenient to program directly. We will not present a full surface language in this work, but one could easily define syntactic sugar to write the exact same function as follows:

One can read this function as the usual functional rev append function on lists, annotated with memory reuse information: if we assume we are the unique owner of the input list and won't need it anymore, we can reuse the memory of its cons cells (given in this example the name l) to store the reversed list. On the other hand, if you read the box and unbox as imperative operations, this code expresses the usual imperative pointer-reversal algorithm.

This double view of linear state occurs in other programming systems with linear state. It was recently emphasized in O'Connor et al. (2016), where the functional point of view is seen as easing formal verification, while the imperative view is used as a compilation technique to produce efficient C code from linear programs.

#### **2.3 Internal** *λ***<sup>L</sup> Syntax and Typing**

To give a dynamic semantics for λ<sup>L</sup> and prove it sound, we need to extend the language with explicit stores and store locations. Indeed, the allocating term new should reduce to a "fresh location" allocated in some store <sup>s</sup>, and neither are part of the surface-language syntax. The corresponding internal typing judgment is more complex, but note that users do not need to know about it to reason about correctness of surface programs. The internal typing is essential for the soundness proof, but also useful for defining the multi-language semantics in Sect. 3.

We work with *configurations* (<sup>s</sup> <sup>|</sup> <sup>e</sup>), which are pairs of a store <sup>s</sup> and a term <sup>e</sup>. Our internal typing judgment <sup>Ψ</sup>; <sup>Γ</sup> <sup>l</sup> <sup>s</sup> <sup>|</sup> <sup>e</sup> : <sup>σ</sup> checks configurations, not just terms, and relies not only on a typing context for variables Γ but also on a *store typing* Ψ, which maps the locations of the configuration to typing assumptions.

Unfortunately, due to space limits, we will not present this part of the type system – which is not directly exposed to users of the language. See some examples of reduction rules in Fig. 5, and the long version of this work.

#### **2.4 Reduction of Internal Terms**

In the long version of this work we give a reduction relation between linear configurations (<sup>s</sup> <sup>|</sup> <sup>e</sup>) <sup>L</sup> <sup>→</sup> (s <sup>|</sup> <sup>e</sup> ) and prove a subject reduction result.

**Theorem 1 (Subject reduction for** <sup>λ</sup><sup>L</sup>**).** *If* <sup>Ψ</sup>; <sup>Γ</sup> <sup>l</sup> <sup>s</sup> <sup>|</sup> <sup>e</sup> : <sup>σ</sup> *and* (<sup>s</sup> <sup>|</sup> <sup>e</sup>) <sup>L</sup> → (s <sup>|</sup> <sup>e</sup> )*, then there exists a (unique)* Ψ *such that* Ψ ; <sup>Γ</sup> <sup>l</sup> <sup>s</sup> <sup>|</sup> <sup>e</sup> : <sup>σ</sup>*.*

### **3 Multi-language Semantics**

To formally define our multi-language semantics we create a combined language λUL which lets us compose term fragments from both λ<sup>U</sup> and λ<sup>L</sup> together, and we give an operational semantics to this combined language. Interoperability is enabled by specifying how to transport values across the language boundaries.

Multi-language systems in the wild are not defined in this way: both languages are given a semantics, by interpretation or compilation, in terms of a shared lowerlevel language (C, assembly, the JVM or CLR bytecode, or Racket's core forms), and the two languages are combined at that level. Our formal multi-language description can be seen as a model of such combinations, that gives a specification of the expected observable behavior of this language combination.

Another difference from multi-languages in the wild is our use of very finegrained language boundaries: a term written in one language can have its subterms written in the other, provided the type-checking rules allow it. Most multilanguage systems, typically using Foreign Function Interfaces, offer coarsergrained composition at the level of compilation units. Fine-grained composition of existing languages, as done in the Eco project (Barrett et al. 2016), is difficult because of semantic mismatches. In the full version of this work we demonstrate that fine-grained composition is a rewarding language design, enabling new programming patterns.

#### **3.1 Lump Type and Language Boundaries**

The core components the multi-language semantics are shown Fig. 6—the communication of values from one language to the other will be described in the next section. The multi-language λUL has two distinct syntactic categories of types, values, and expressions: those that come from λ<sup>U</sup> and those that come from λ<sup>L</sup>. Contexts, on the other hand, are mixed, and can have variables of both sorts. For a mixed context Γ, the notation !Γ only applies (!) to its linear variables.

The typing rules of λ<sup>U</sup> and λ<sup>L</sup> are imported into our multi-language system, working on those two separate categories of program. They need to be extended to handle mixed contexts Γ instead of their original contexts Γ and Γ. In the linear case, the rules look exactly the same. In the ML case, the typing rules implicitly duplicate all the variables in the context. It would be unsound to extend them to arbitrary linear variables, so they use a duplicable context !Γ.

To build interesting multi-language programs, we need a way to insert a fragment coming from a language into a term written in another. This is done using *language boundaries*, two new term formers LU(e) and UL(s:<sup>Ψ</sup> <sup>|</sup> <sup>e</sup>) that inject an ML term into the syntactic category of linear terms, and a linear configuration into the syntactic category of ML terms.

Of course, we need new typing rules for these term-level constructions, clarifying when it is valid to send a value from λ<sup>U</sup> into λ<sup>L</sup> and vice versa. It would be incorrect to allow sending any type from one language into the other—for instance, by adding the counterpart of our language boundaries in the syntax of types—since values of linear types must be uniquely owned so they cannot possibly be sent to the ML side as the ML type system cannot enforce unique ownership.

On the other hand, any ML value could safely be sent to the linear world. For closed types, we could provide a corresponding linear type (1 maps to !1, etc.), but an ML value may also be typed by an abstract type variable α, in which case we can't know what the linear counterpart should be. Instead of trying to provide translations, we will send any ML type σ to the *lump type* [σ], which embeds ML types into linear types. A lump is a blackbox, not a type translation: the linear language does not assume anything about the behavior of its values the values of [σ] are of the form [v], where v : σ is an ML value that the linear world cannot use. More precisely, we only propagate the information that ML values are all duplicable by sending σ to ![σ].

The typing rules for language boundaries insert lumps when going from λ<sup>U</sup> to λL, and remove them when going back from λ<sup>L</sup> to λU. In particular, arbitrary linear types cannot occur at the boundary, they must be of the form ![σ].

#### **Fig. 6.** Multi-language: lump and boundaries

**Fig. 7.** Interoperability: static and dynamic semantics (excerpt)

Finally, boundaries have reduction rules: a term or configuration inside a boundary in reduction position is reduced until it becomes a value, and then a lump is added or removed depending on the boundary direction. Note that because the <sup>v</sup> in UL(s:<sup>Ψ</sup> <sup>|</sup> <sup>v</sup>) is at a duplicable type ![σ], we know by inversion that the store is empty.

#### **3.2 Interoperability: Static Semantics**

If the linear language could not interact with lumped values at all, our multilanguage programs would be rather boring, as the only way for the linear extension to provide a value back to ML would be to have received it from λ<sup>U</sup> and pass it back unchanged (as in the lump embedding of Matthews and Findler (2009)). To provide a real interaction, we provide a way to extract values out of a lump ![σ], use it at some linear type σ, and put it back in before sending the result to λ<sup>U</sup>.

The correspondence between intuitionistic types σ and linear types σ is specified by a heterogeneous *compatibility relation* σ σ – defined in full in Fig. 7. The specification of this relation is that if σ σ holds, then the space of values of ![σ] and σ are isomorphic: we can convert back and forth between them. When this relation holds, the term-formers lump<sup>σ</sup> and <sup>σ</sup>unlump perform the conversion.

The term LU(e) turns a <sup>e</sup> : <sup>σ</sup> into a lumped type ![σ], and we need to unlump it with some <sup>σ</sup>unlump for a compatible <sup>σ</sup> σ to interact with it on the linear side. It is common to combine both operations and we provide syntactic sugar for it: <sup>σ</sup>LU(e). Similarly ULσ(e) first lumps a linear term then sends the result to the ML world.

#### **3.3 Interoperability: Dynamic Semantics**

When the relation σ <sup>σ</sup> holds, we can define a relation <sup>v</sup> <sup>↔</sup><sup>σ</sup> <sup>v</sup> between the values of σ and the values of σ – see the long version of this work. It is functional in both direction: with our definition v is uniquely determined from v and conversely. We then define the reduction rule for (un)lumping: if <sup>v</sup> <sup>↔</sup><sup>σ</sup> <sup>v</sup>, then

### **3.4 Full Abstraction from** *λ***<sup>U</sup> into** *λ***UL**

We can now state the major meta-theoretical result of this work, which is the proposed multi-language design extends the simple language λ<sup>U</sup> in a way that provably has, in a certain sense, "no abstraction leaks".

**Definition 1 (Contextual equivalence in** λ<sup>U</sup>**).** *We say that* e, e *such that* <sup>Γ</sup> <sup>u</sup> <sup>e</sup>, <sup>e</sup> : <sup>σ</sup> *are* contextually equivalent*, written* <sup>e</sup> <sup>≈</sup>*ctx* <sup>u</sup> e *, if, for any expression context* <sup>C</sup>[] *such that* · <sup>u</sup> <sup>C</sup>[e] : <sup>1</sup>*, the closed terms* <sup>C</sup>[e] *and* <sup>C</sup>[e ] *are equiterminating.*

**Definition 2 (Contextual equivalence in** λUL**).** *We say that* e, e *such that* <sup>Γ</sup> lu <sup>e</sup>, <sup>e</sup> : <sup>σ</sup> *are* contextually equivalent*, written* <sup>e</sup> <sup>≈</sup>*ctx* lu e *, if, for any expression context* <sup>C</sup>[] *such that* · lu <sup>C</sup>[e] : <sup>1</sup>*, the closed terms* <sup>C</sup>[e] *and* <sup>C</sup>[e ] *are equi-terminating.*

**Theorem 2 (Full Abstraction).** *The embedding of* λ<sup>U</sup> *into* λUL *is fullyabstract:*

### **4 Conclusion and Related Work**

Having a stack of usable, interoperable languages, extensions or dialects is at the forefront of the Racket approach to programming environments, in particular for teaching (Felleisen et al. 2004).

Our multi-language semantics builds on the seminal work by Matthews and Findler (2009), who gave a formal semantics of interoperability between a dynamically and a statically typed language. Others have followed the Matthews-Findler approach of designing multi-language systems with finegrained boundaries—for instance, formalizing interoperability between a simply and dependently typed language (Osera et al. 2012); between a functional and typed assembly language (Patterson et al. 2017); between an ML-like and an affinely typed language, where linearity is enforced at runtime on the ML side using stateful contracts (Tov and Pucella 2010); and between the source and target languages of compilation to specify compiler correctness (Perconti and Ahmed 2014). However, all these papers address only the question of soundness of the multi-language; we propose a formal treatment of *usability* and absence of abstraction leaks.

The only work to establish that a language embeds into a multi-language in a fully abstract way is the work on fully abstract compilation by Ahmed and Blume (2011) and New et al. (2016) who show that their compiler's source language embeds into their source-target multi-language in a fully abstract way. But the focus of this work was on fully abstract compilation, not on usability of user-facing languages.

The Eco project (Barrett et al. 2016) is studying multi-language systems where user-exposed languages are combined in a very fine-grained way; it is closely related in that it studies the user experience in a multi-language system. The choice of an existing dynamic language creates delicate interoperability issues (conflicting variable scoping rules, etc.) as well as performance challenges. We propose a different approach, to design new multi-languages from scratch with interoperability in mind to avoid legacy obstacles.

We are not aware of existing systems exploiting the simple idea of using promotion to capture uniquely-owned state and dereliction to copy it—common formulations would rather perform copies on the contraction rule.

The general idea that linear types can permit reuse of unused allocated cells is not new. In Wadler (1990), a system is proposed with both linear and nonlinear types to attack precisely this problem. It is however more distant from standard linear logic and somewhat ad-hoc; for example, there is no way to permanently turn a uniquely-owned value into a shared value, it provides instead a local *borrowing* construction that comes with ad-hoc restrictions necessary for safety. (The inability to *give up* unique ownership, which is essential in our listprogramming examples, seems to also be missing from Rust, where one would need to perform a costly operation of traversing the graph of the value to turn all pointers into Arc nodes.)

The RAML project (Hoffmann et al. 2012) also combines linear logic and memory reuse: its *destructive match* operator will implicitly reuse consumed cells in new allocations occurring within the match body. Multi-languages give us the option to explore more explicit, flexible representations of those low-level concern, without imposing the complexity to all programmers.

A recent related work is the Cogent language (O'Connor et al. 2016), in which linear state is also viewed as both functional and imperative – the latter view enabling memory reuse. The language design is interestingly reversed: in Cogent, the linear layer is the simple language that everyone uses, and the nonlinear language is a complex but powerful language that is used when one really has to, named C.

Our linear language λ<sup>L</sup> is sensibly simpler, and in several ways less expressive, than advanced programming languages based on linear logic (Tov and Pucella 2011), separation logic (Balabonski et al. 2016), fine-grained permissions (Garcia et al. 2014): it is not designed to stand on its own, but to serve as a useful sidekick to a functional language, allowing safer resource handling.

One major simplification of our design compared to more advanced linear or separation-logic-based languages is that we do not separate physical locations from the logical capability/permission to access them (e.g., as in Ahmed et al. (2007)). This restricts expressiveness in well-understood ways (Fahndrich and DeLine 2002): shared values cannot point to linear values.

Alms (Tov and Pucella 2011), Quill (Morris 2016) and Linear Haskell (Bernardy et al. 2018) add linear types to a functional language, trying hard not to lose desirable usability property, such as type inference or the genericity of polymorphic higher-order functions. This is very challenging; for example, Linear Haskell gives up on principality of inference<sup>4</sup>. Our multi-language design side-steps this issue as the general-purpose language remains unchanged. Language boundaries are more rigid than an ideal no-compromise language, as they force users to preserve the distinction between the general-purpose and the advanced features; it is precisely this compromise that gives a design of reduced complexity.

Finally, on the side of the semantics, our system is related to LNL (Benton 1994), a calculus for linear logic that, in a sense, is itself built as a multi-language system where (non-duplicable) linear types and (duplicable) intuitionistic types interact through a boundary. It is not surprising that our design contains an instance of this adjunction: for any σ there is a unique σ such that σ - !σ, and converting a σ value to this σ and back gives a !σ and is provably equivalent, by boundary cancellation, to just using share.

**Acknowledgments.** We thank our anonymous reviewers for their feedback, as well as Neelakantan Krishnaswami, Fran¸cois Pottier, Jennifer Paykin, Sylvie Boldot and Simon Peyton-Jones for our discussions on this work.

This work was supported in part by the National Science Foundation under grants CCF-1422133 and CCF-1453796, and the European Research Council under ERC Starting Grant SECOMP (715753). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of our funding agencies.

### **References**


<sup>4</sup> Thanks to Stephen Dolan for pointing out that λf.λx. f x has several incompatible Linear Haskell types.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Concurrency

# **Automata for True Concurrency Properties**

Paolo Baldan(B) and Tommaso Padoan

Dipartimento di Matematica, Universit`a di Padova, Padua, Italy {baldan,padoan}@math.unipd.it

**Abstract.** We present an automata-theoretic framework for the model checking of true concurrency properties. These are specified in a fixpoint logic, corresponding to history-preserving bisimilarity, capable of describing events in computations and their dependencies. The models of the logic are event structures or any formalism which can be given a causal semantics, like Petri nets. Given a formula and an event structure satisfying suitable regularity conditions we show how to construct a parity tree automaton whose language is non-empty if and only if the event structure satisfies the formula. The automaton, due to the nature of event structure models, is usually infinite. We discuss how it can be quotiented to an equivalent finite automaton, where emptiness can be checked effectively. In order to show the applicability of the approach, we discuss how it instantiates to finite safe Petri nets. As a proof of concept we provide a model checking tool implementing the technique.

### **1 Introduction**

Behavioural logics with the corresponding verification techniques are a cornerstone of automated verification. For concurrent and distributed systems, so called true concurrent models can be an appropriate choice, since they describe not only the possible steps in the evolution of the system but also their causal dependencies. A widely used foundational model in this class is given by Winskel's event structures [1]. They describe the behaviour of a system in terms of events in computations and two dependency relations: a partial order modelling causality and an additional relation modelling conflict. A survey on the use of such causal models can be found in [2]. Recently they have been used in the study of concurrency in weak memory models [3,4], for process mining and differencing [5], in the study of atomicity [6] and of information flow [7] properties.

Operational models can be abstracted by considering true concurrent equivalences that range from hereditary history preserving bisimilarity to the coarser pomset and step equivalences (see, e.g., [8]) and behavioural logics expressing causal properties (see, e.g., [9–14] for a necessarily partial list and [15–19] for some related verification techniques).

Event-based logics have been recently introduced [20,21], capable of uniformly characterising the equivalences in the true concurrent spectrum. Their formulae include variables which are bound to events in computations and describe their dependencies. While the relation between operational models, behavioural equivalences and event-based true concurrent logics is well understood, the corresponding model checking problem has received limited attention.

We focus on the logic referred to as Lhp in [20], corresponding to a classical equivalence in the spectrum, i.e., history preserving (hp-)bisimilarity [22–24].

Decidability of model checking is not obvious since event structure models are infinite even for finite state systems and the possibility of expressing properties that depends on the past often leads to undecidability [25]. In a recent paper [26] we proved the decidability of the problem for the alternation free fragment of the logic Lhp over a class of event structures satisfying a suitable regularity condition [27] referred to as strong regularity. The proof relies on a tableaubased model checking procedure. Despite the infiniteness of the model, a suitable stop condition can be identified, ensuring that a successful finite tableau can be generated if and only if the formula is satisfied by the model.

Besides the limitation to the alternation free fragment of Lhp, a shortcoming of the approach is that a direct implementation of the procedure can be extremely inefficient. Roughly speaking, the problem is that in the search of a successful tableau, branches which are, in some sense, equivalent are explored several times.

In this paper we devise an automata-theoretic technique, in the style of [28], for model checking Lhp that works for the full logic, without constraints on the alternation depth. Besides providing an alternative approach for model-checking Lhp, amenable of a more efficient implementation, this generalises the decidability result of [26] to the full logic Lhp. Given a formula in Lhp and a strongly regular event structure, the procedure generates a parity tree automaton. Satisfiability is reduced to emptiness in the sense that the event structure satisfies the formula if and only if the automaton accepts a non-empty language.

The result is not directly usable for practical purposes since the automaton is infinite for any non-trivial event structure. However an equivalence on states can be defined such that the quotiented automaton accepts the same language as the original one. Whenever such equivalence is of finite index the quotiented automaton is finite, so that satisfaction of the formula can be checked effectively on the quotient. We show that for all strongly regular event structures a canonical equivalence always exists that is of finite index.

The procedure is developed abstractly on event structures. A concrete algorithm on some formalism requires the effectiveness of the chosen equivalence on states. We develop a concrete instantiation of the algorithm on finite safe Petri nets. It is implemented in a tool, wishfully called *True concurrency workbench* (TCWB), written in Haskell. Roughly, the search of an accepting run in the automaton can be seen as an optimisation of the procedure for building a successful tableau in [26] where the graph structure underlying the automaton helps in the reuse of the information discovered. Some tests reveal that the TCWB is way more efficient than the direct implementation of the tableau-based procedure (which could not manage most of the examples in the TCWB repository).

The rest of the paper is structured as follows. In Sect. 2 we review event structures, strong regularity and the logic Lhp of interest in the paper. In Sect. 3 we introduce (infinite state) parity tree automata and we show how the model checking problem for Lhp on strongly regular pes can be reduced to the nonemptiness of the language of such automata. In Sect. 4 we discuss the instantiation of the approach to Petri nets. Finally, in Sect. 5 we discuss some related work and outline directions of future research. Due to space limitations, proofs are only sketched.

### **2 Event Structures and True Concurrent Logic**

We introduce prime event structures [1] and the subclass of strongly regular event structures on which our model checking approach will be developed. Then we present the logic for true concurrency of interest in the paper.

#### **2.1 Prime Event Structures and Regularity**

Throughout the paper E is a fixed countable set of events, Λ a finite set of labels ranged over by <sup>a</sup>, <sup>b</sup>, <sup>c</sup> . . . and <sup>λ</sup> : <sup>E</sup> <sup>→</sup> <sup>Λ</sup> a labelling function.

**Definition 1 (prime event structure).** *A (*Λ*-labelled)* prime event structure *(*pes*) is a tuple* <sup>E</sup> <sup>=</sup> E, <sup>≤</sup>, #*, where* <sup>E</sup> <sup>⊆</sup> <sup>E</sup> *is the set of* events *and* <sup>≤</sup>*,* # *are binary relations on* E*, called* causality *and* conflict *respectively, such that: 1.* ≤ *is a partial order and* e = {e ∈ E | e ≤ e} *is finite for all* e ∈ E*; 2.* # *is irreflexive, symmetric and inherited along* ≤*, i.e., for all* e, e , e ∈ E*, if* e#e ≤ e *then* e#e*.*

*The* pes E<sup>1</sup> = E1, ≤1, #1*,* E<sup>2</sup> = E2, ≤2, #2 *are* isomorphic*, written* E<sup>1</sup> ∼ E2*, when there is a bijection* ι : E<sup>1</sup> → E<sup>2</sup> *such that for all* e1, e ∈ E1*, it holds* e<sup>1</sup> ≤<sup>1</sup> e *iff* ι(e1) ≤<sup>2</sup> ι(e ) *and* e<sup>1</sup> #<sup>1</sup> e *iff* ι(e1) #<sup>2</sup> ι(e ) *and* λ(e1) = λ(ι(e1))*.*

In the following, we will assume that the components of a pes E are named as in the definition above, possibly with subscripts. The concept of concurrent computation for pess is captured by the notion of configuration.

**Definition 2 (configuration).** *<sup>A</sup>* configuration *of a* pes <sup>E</sup> *is a finite set of events* C ⊆ E consistent *(i.e.,* ¬(e#e ) *for all* e, e ∈ C*) and* causally closed *(i.e.,* e ⊆ C *for all* e ∈ C*). We denote by* C(E) *the set of configurations of* E*.*

The evolution of a pes can be represented by a transition system over configurations, with the empty configuration as initial state.

**Definition 3 (transition system).** *Let* <sup>E</sup> *be a* pes *and let* <sup>C</sup> ∈ C(E)*. Given* <sup>e</sup> <sup>∈</sup> <sup>E</sup> - C *such that* C ∪ {e}∈C(E)*, and* X, Y ⊆ C *with* X ⊆ e*,* Y ∩ e = ∅ *we write* <sup>C</sup> X,Y <e −−−−−→<sup>λ</sup>(e) <sup>C</sup> ∪ {e}*. The set of* enabled *events at a configuration* <sup>C</sup> *is defined as en*(C) = {<sup>e</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> <sup>C</sup> <sup>e</sup> −→ C }*. The* pes *is called* k-bounded *for some* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *(or simply* bounded*) if* <sup>|</sup>*en*(C)| ≤ <sup>k</sup> *for all* <sup>C</sup> ∈ C(E)*.*

**Fig. 1.** (a) A pes E<sup>N</sup> associated with the net N in (b) via its unfolding (c).

Transitions are labelled by the executed event e. In addition, they report its label λ(e), a subset of causes X and a set of events Y ⊆ C concurrent with e. When <sup>X</sup> or <sup>Y</sup> are empty they are normally often, i.e., e.g., we write <sup>C</sup> X<e −−−→<sup>λ</sup>(e) C for <sup>C</sup> <sup>∅</sup> < e −−−→<sup>λ</sup>(e) <sup>C</sup> and <sup>C</sup> <sup>e</sup> −→<sup>λ</sup>(e) <sup>C</sup> for <sup>C</sup> <sup>∅</sup>,<sup>∅</sup> < e −−−−→<sup>λ</sup>(e) <sup>C</sup> .

The pes modelling a non-trivial system is normally infinite. We will work on a subclass identified by finitarity requirements on the possible substructures.

**Definition 4 (residual).** *Let* <sup>E</sup> *be a* pes*. For a configuration* <sup>C</sup> ∈ C(E)*, the* residual *of* <sup>E</sup> *after* <sup>C</sup>*, is defined as* <sup>E</sup>[C] = {<sup>e</sup> <sup>|</sup> <sup>e</sup> <sup>∈</sup> <sup>E</sup> -C ∧ C ∪ {e} *consistent*}*.*

The residual of E can be seen as a pes, endowed with the restriction of causality and conflict of E. Intuitively, it represents the pes that remains to be executed after the computation expressed by C. Given C ∈ C(E) and X ⊆ C, we denote by E[C] ∪ X the pes obtained from E[C] by adding the events in X with the causal dependencies they had in the original pes E.

**Definition 5 (strong regularity).** *<sup>A</sup>* pes <sup>E</sup> *is called* strongly regular *when it is bounded and for each* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *the set* {E[C] ∪ {e1,...,e<sup>k</sup>} | <sup>C</sup> ∈ C(E) <sup>∧</sup> e1,...,e<sup>k</sup> ∈ C} *is finite up to isomorphism of* pes*s.*

Strong regularity [26] is obtained from the notion of regularity in [27], by replacing residuals with residuals extended with a bounded number of events from the past. Intuitively, this is important since we are interested in history dependent properties. We will later show in Sect. 4 that the pess associated with finite safe Petri nets, i.e., the regular trace pess [27], are strongly regular.

A simple pes is depicted in Fig. 1a. Graphically, curly lines represent immediate conflicts and the causal partial order proceeds upwards along the straight lines. Events are denoted by their labels, possibly with superscripts. For instance, in <sup>E</sup><sup>N</sup> , the events <sup>a</sup><sup>0</sup> and <sup>b</sup><sup>0</sup>, labelled by <sup>a</sup> and <sup>b</sup>, respectively, are in conflict. Event <sup>c</sup><sup>0</sup> causes the events <sup>a</sup><sup>i</sup> and it is concurrent with <sup>b</sup><sup>i</sup> for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>. It is an infinite pes associated with the Petri net N in Fig. 1b in a way that will be discussed in Sect. 4.1, hence it is strongly regular by Corollary 1. It has five (equivalence classes of) residuals extended with an event from the past <sup>E</sup><sup>N</sup> [{b<sup>0</sup>}]∪ {b<sup>0</sup>}, <sup>E</sup><sup>N</sup> [{c<sup>0</sup>, <sup>b</sup><sup>0</sup>}]∪ {b<sup>0</sup>}, <sup>E</sup><sup>N</sup> [{c<sup>0</sup>, <sup>a</sup><sup>0</sup>}]∪ {c<sup>0</sup>}, <sup>E</sup><sup>N</sup> [{c<sup>0</sup>, <sup>a</sup><sup>0</sup>}]∪ {a<sup>0</sup>}, and <sup>E</sup><sup>N</sup> [{c<sup>0</sup>, <sup>b</sup><sup>0</sup>, <sup>a</sup><sup>1</sup>}] ∪ {b<sup>0</sup>}.

#### **2.2 True Concurrent Logic**

The logic of interest for this paper, originally defined in [20], is a Hennessy-Milner style logic that allows one to specify the dependencies (causality and concurrency) between events in computation.

Logic formulae include event variables, from a fixed denumerable set *Var* , denoted by x, y, . . .. Tuples of variables like x1,...,x<sup>n</sup> will be denoted by a corresponding boldface letter **x** and, abusing the notation, tuples will be often used as sets. The logic includes diamond and box modalities. The formula |**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>| <sup>ϕ</sup> holds in a configuration when an a-labelled event e is enabled which causally depends on the events bound to **x** and is concurrent with those in **y**. Event e is executed and then the formula ϕ must hold, with e bound to variable z. Dually, [[**x**, **y** < a z]] ϕ is satisfied when all a-labelled events causally dependent on **x** and concurrent with **y** bring to a configuration where ϕ holds.

For dealing with fixpoint operators we fix a denumerable set <sup>X</sup> <sup>a</sup> of *abstract propositions*, ranged over by X, Y , . . . . Each abstract proposition X has an arity *ar* (X) and it represents a formula with *ar* (X) (unnamed) free event variables. Then, for **<sup>x</sup>** such that <sup>|</sup>**x**<sup>|</sup> <sup>=</sup> *ar* (X), we write <sup>X</sup>(**x**) to indicate the abstract proposition X whose free event variables are named **x**.

**Definition 6 (syntax).** *The syntax of* <sup>L</sup>hp *over the sets of event variables Var , abstract propositions* <sup>X</sup> <sup>a</sup> *and labels* <sup>Λ</sup> *is defined as follows:*

$$\begin{array}{rcl} \varphi & ::= & X(\mathbf{x}) \mid \mathsf{T} \mid \mid \varphi \land \varphi \mid \mid \langle \mathbf{x}, \overline{\mathbf{y}} < \mathbf{a} \, z \rangle \, \varphi \mid \, \nu X(\mathbf{x}). \varphi \\ & & \mid \, \mathsf{F} \mid \mid \varphi \lor \varphi \mid \; \left[ \mathbf{x}, \overline{\mathbf{y}} < \mathbf{a} \, z \right] \, \varphi \mid \; \mu X(\mathbf{x}). \varphi \end{array}$$

For a formula ϕ we denote by *fv*(ϕ) its free event variables, defined in the obvious way. Just note that the modalities act as binders for the variable representing the event executed, hence *fv*(|**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>| <sup>ϕ</sup>) = *fv*([[**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>]] <sup>ϕ</sup>) = (*fv*(ϕ) - {z}) <sup>∪</sup> **<sup>x</sup>** <sup>∪</sup> **<sup>y</sup>**. For formulae νX(**x**).ϕ and μX(**x**).ϕ we require that *fv*(ϕ) = **x**. The free propositions in ϕ not bound by μ or ν, are denoted by *fp*(ϕ). When both *fv*(ϕ) and *fp*(ϕ) are empty we say that ϕ is *closed*. When **x** or **<sup>y</sup>** are empty are omitted, e.g., we write |<sup>a</sup> <sup>z</sup>| <sup>ϕ</sup> for |∅, <sup>∅</sup> <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>| <sup>ϕ</sup>.

For example, the formula <sup>ϕ</sup><sup>1</sup> <sup>=</sup> |<sup>c</sup> <sup>x</sup>|(|x < <sup>a</sup> <sup>y</sup>|<sup>T</sup> ∧ |x < <sup>b</sup> <sup>z</sup>|T) requires that, after the execution of a c-labelled event, one can choose between a causally dependent a-labelled event and a concurrent b-labelled event. It is satisfied by <sup>E</sup><sup>N</sup> in Fig. 1a. Instead <sup>ϕ</sup><sup>2</sup> <sup>=</sup> |<sup>c</sup> <sup>x</sup>|(|x < <sup>a</sup> <sup>y</sup>|<sup>T</sup> ∧ |x < <sup>b</sup> <sup>z</sup>|T) requiring both events to be concurrent would be false. Moving to infinite computations, consider <sup>ϕ</sup><sup>3</sup> = [[<sup>b</sup> <sup>x</sup>]]νZ(x).|<sup>c</sup> <sup>z</sup>||z < <sup>b</sup> <sup>y</sup>|T∧[[x < <sup>b</sup> <sup>y</sup>]]Z(y), expressing that all non-empty causal chains of b-labelled events reach a state where it is possible to execute two concurrent events labelled <sup>c</sup> and <sup>b</sup>, respectively. Then <sup>ϕ</sup><sup>3</sup> holds in <sup>E</sup><sup>N</sup> . Another formula satisfied by <sup>E</sup><sup>N</sup> is <sup>ϕ</sup><sup>4</sup> <sup>=</sup> |<sup>c</sup> <sup>x</sup>||x < <sup>b</sup> <sup>y</sup>|νX(x, y).|y, x < <sup>b</sup> <sup>z</sup>|X(x, z) requiring the existence of an infinite causal chain of b-labelled events, concurrent with a c-labelled event.

The logic Lhp is interpreted over pess. The satisfaction of a formula is defined with respect to a configuration C and a (total) function η : *Var* → E, called an *environment*, that binds free variables in ϕ to events in C. Namely, if *Env* <sup>E</sup> denotes the set of environments, the semantics of a formula will be a set of pairs in C(E)×*Env* <sup>E</sup> . The semantics of Lhp also depends on a *proposition environment* <sup>π</sup> : X → <sup>2</sup>C(E)×*Env*<sup>E</sup> which provides an interpretation for propositions. In order to ensure that the semantics of a formula only depends on the events associated with its free variables and is independent on the naming of the variables, it is required that if (C, η) <sup>∈</sup> <sup>π</sup>(X(**x**)) and <sup>η</sup> (**y**) = η(**x**) pointwise, then (C, η ) <sup>∈</sup> <sup>π</sup>(X(**y**)). We denote by *PEnv* <sup>E</sup> the set of proposition environments, ranged over by π.

We can now give the semantics of logic Lhp. Given an event environment η and an event e we write η[x → e] for the updated environment which maps x to e. Similarly, for a proposition environment π and S ⊆ C(E) × *Env* <sup>E</sup> , we write <sup>π</sup>[Z(**x**) → <sup>S</sup>] for the corresponding update.

**Definition 7 (semantics).** *Let* <sup>E</sup> *be a* pes*. The denotation of a formula* <sup>ϕ</sup> *in* <sup>L</sup>hp *is given by the function* {|·|}<sup>E</sup> : <sup>L</sup>hp <sup>→</sup> *PEnv* <sup>E</sup> <sup>→</sup> <sup>2</sup>C(E)×*Env*<sup>E</sup> *defined inductively as follows, where we write* {|ϕ|}<sup>E</sup> <sup>π</sup> *instead of* {|ϕ|}<sup>E</sup> (π)*:*

{|T|}<sup>E</sup> <sup>π</sup> = C(E) × *Env*<sup>E</sup> {|F|}<sup>E</sup> <sup>π</sup> = ∅ {|Z(**y**)|}<sup>E</sup> <sup>π</sup> = π(Z(**y**)) {|ϕ<sup>1</sup> ∧ ϕ2|}<sup>E</sup> <sup>π</sup> = {|ϕ1|}<sup>E</sup> <sup>π</sup> ∩ {|ϕ2|}<sup>E</sup> <sup>π</sup> {|ϕ<sup>1</sup> ∨ ϕ2|}<sup>E</sup> <sup>π</sup> = {|ϕ1|}<sup>E</sup> <sup>π</sup> ∪ {|ϕ2|}<sup>E</sup> π {||**x**, **y** < a z| ϕ|}<sup>E</sup> <sup>π</sup> <sup>=</sup> {(C, η) | ∃e. C <sup>η</sup>(**x**),η(**y**) < e −−−−−−−−→<sup>a</sup> <sup>C</sup>- ∧ (C- , η[z → e]) ∈ {|ϕ|}<sup>E</sup> <sup>π</sup>} {|[[**x**, **y** < a z]] ϕ|}<sup>E</sup> <sup>π</sup> <sup>=</sup> {(C, η) | ∀e. C <sup>η</sup>(**x**),η(**y**) < e −−−−−−−−→<sup>a</sup> <sup>C</sup>- ⇒ (C- , η[z → e]) ∈ {|ϕ|}<sup>E</sup> <sup>π</sup>} {|νZ(**x**).ϕ|}<sup>E</sup> <sup>π</sup> = *gfp*(fϕ,Z(**x**),π) {|μZ(**x**).ϕ|}<sup>E</sup> <sup>π</sup> = *lfp*(fϕ,Z(**x**),π)

*where* <sup>f</sup>ϕ,Z(**x**),π : 2C(E)×*Env*<sup>E</sup> <sup>→</sup> <sup>2</sup>C(E)×*Env*<sup>E</sup> *is defined by* <sup>f</sup>ϕ,Z(**x**),π(S) = {|ϕ|}<sup>E</sup> <sup>π</sup>[Z(**x**)→S] *and gfp*(fϕ,Z(**x**),π) *(resp. lfp*(fϕ,Z(**x**),π)*) denotes the corresponding greatest (resp. least) fixpoint. We say that a* pes E *satisfies a formula* ϕ *and write* E |= ϕ *if* (∅, η) ∈ {|ϕ|}<sup>E</sup> <sup>π</sup> *for all environments* η *and* π*.*

The semantics of boolean operators is standard. The formula |**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>| <sup>ϕ</sup> holds in (C, η) when configuration C enables an a-labelled event e that causally depends on (at least) the events bound to the variables in **x** and concurrent with (at least) those bound to the variables in **y** and, once executed, it produces a new configuration C = C ∪ {e} which, paired with the environment η = η[z → e], satisfies the formula ϕ. Dually, [[**x**, **y** < a z]] ϕ holds when all a-labelled events executable from C, caused by **x** and concurrent with **y** bring to a configuration where ϕ is satisfied.

The fixpoints corresponding to the formulae νZ(**x**).ϕ and μZ(**x**).ϕ are guaranteed to exist by Knaster-Tarski theorem, since the set 2C(E)×*Env*<sup>E</sup> ordered by subset inclusion is a complete lattice and the functions fϕ,Z(**x**),π are monotonic.

### **3 Automata-Based Model Checker**

We introduce nondeterministic parity tree automata and we show how the model checking problem for Lhp on strongly regular pess can be reduced to the nonemptiness of the language of such automata. The automaton naturally generated from a pes and a formula has an infinite number of states. We discuss how the automaton can be quotiented to a finite one accepting the same language and thus potentially useful for model checking purposes.

#### **3.1 Infinite Parity Tree Automata**

Automata on infinite trees revealed to be a powerful tool to various problems in the setting of branching temporal logics. Here we focus on nondeterministic parity tree automata [29], with some (slightly) non-standard features. We work on k-trees (rather than on binary trees), a choice that will simplify the presentation, and we allow for possibly infinite state automata.

When automata are used for model checking purposes it is standard to restrict to unlabelled trees. A k*-bounded branching tree* or k*-tree*, for short, is a subset T ⊆ [1, k] , such that


Elements of T are the nodes of the tree. The empty string corresponds to the root. A string of the form wi corresponds to the i-th child of w. Hence by (2) each branch is infinite and by (3) the presence of the i-th child implies the presence of the j-th children for j ≤ i.

**Definition 8 (nondeterministic parity automaton).** *A* k-bounded nondeterministic parity tree automaton *(NPA) is a tuple* A = Q, −→, q0, F *where* Q *is a set of states,* −→⊆ Q × - k i=1 <sup>Q</sup><sup>k</sup> *is the* transition relation*,* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> *is the* initial state*, and* F = (F0,...,Fh) *is the* acceptance condition*, where* F0,...,F<sup>h</sup> ⊆ Q *are mutually disjoint subsets of states.*

Transitions are written as q −→ (q1,...,qm) instead of (q,(q1,...,qm)) ∈−→.

Given a k-tree T , a *run* of A on T is a labelling of T over the states r : T → Q consistent with the transition relation, i.e., such that r( ) = q<sup>0</sup> and for all u ∈ T , with m children, there is a transition r(u) −→ (r(u1),...,r(um)) in A. A *path* in the run r is an infinite sequence of states p = (q0, q1,...) labelling a complete path from the root in the tree. It is called *accepting* if there exists an even number l ∈ [0, h] such that the set {j | q<sup>j</sup> ∈ F<sup>l</sup>} is infinite and the set {j | q<sup>j</sup> ∈ - l<i≤<sup>h</sup> <sup>F</sup><sup>i</sup>} is finite. The run <sup>r</sup> is *accepting* if all paths are accepting. **Definition 9 (language of an NPA).** *Let* <sup>A</sup> *be an NPA. The language of* <sup>A</sup>*, denoted by L*(A)*, consists of the trees* T *which admit an accepting run.*

Observe that for a k-bounded NPA, the language *L*(A) is a set of k-trees.

The possibility of having an infinite number of states and the associated acceptance condition are somehow non-standard. However, it is easy to see that whenever an NPA is finite, the acceptance condition coincides with the standard one requiring a single state with maximal even priority to occur infinitely often.

Since NPAs are nondeterministic, different runs (possibly infinitely many) can exist for the same input tree. Still, the non-emptiness problem, also for our k-ary variant, is decidable when the number of states is finite (and solvable by a corresponding parity game [30]).

#### **3.2 Infinite NPAs for Model Checking**

We show how, given a pes and a closed formula in Lhp, we can build an NPA in a way that, for strongly regular pess, the satisfaction of ϕ in E reduces to the non-emptiness of the automaton language. The construction is inspired by that in [28] for the mu-calculus.

The acceptance condition for the automaton will refer to the fixpoint alternation in the formulae of Lhp. We adapt a definition from [28]. A fixpoint formula αX(**y**).ϕ , for α ∈ {ν, μ}, is called an α-formula. Hereafter α ranges over {ν, μ}. Given an α-formula ϕ = αX(**y**).ϕ , we say that a subformula ψ of ϕ is a *direct active subformula*, written ψ <sup>d</sup> ϕ, if the abstract proposition X appears free in ψ. The transitive closure of <sup>d</sup> is a partial order and when ψ <sup>∗</sup> <sup>d</sup> ϕ we say that ψ is an *active subformula* of ϕ. We denote by sf(ϕ) the set of subformulae of a formula ϕ and by sfα(ϕ) the set of active α-subformulae.

The *alternation depth* of a formula <sup>ϕ</sup> in <sup>L</sup>hp, written ad(ϕ), is defined, for <sup>a</sup> <sup>ν</sup>-formula <sup>ϕ</sup>, as ad(ϕ) = max{1 + ad(ψ) <sup>|</sup> <sup>ψ</sup> <sup>∈</sup> sfμ(ϕ)} and dually, for a <sup>μ</sup>-formula <sup>ϕ</sup>, as ad(ϕ) = max{1 + ad(ψ) <sup>|</sup> <sup>ψ</sup> <sup>∈</sup> sfν(ϕ)}. For any other formula <sup>ϕ</sup>, ad(ϕ) = *max*{ad(ψ) <sup>|</sup> <sup>ψ</sup> <sup>∈</sup> sf(ϕ) \ {ϕ}}. It is intended that max <sup>∅</sup> = 0. E.g., by the first clause above, the alternation depth of νX(**x**). ϕ is 0 in absence of active μ-subformulae.

Hereafter we assume that in every formula different bound propositions have different names, so that we can refer to the fixpoint subformula quantifying an abstract proposition. This requirement can always be fulfilled by alpha-renaming.

Hereafter, if X and X are abstract propositions quantified in α-subformulae αX(**x**). ϕ and α X (**x** ). ϕ , we will write ad(X) for ad(αX(**x**). ϕ) and <sup>X</sup> <sup>d</sup> <sup>X</sup> for αX(**x**). ϕ <sup>d</sup> <sup>α</sup> X (**x** ). ϕ . Moreover, given a pes E, for a pair (C, η) ∈ <sup>C</sup>(E) <sup>×</sup> *Env* <sup>E</sup> and variables **<sup>x</sup>**, **<sup>y</sup>**, <sup>z</sup>, we define (**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup>z)-successors of (C, η), as

$$\mathsf{Succ}^{\mathbf{x},\overline{\mathbf{y}}<\mathbf{a}z}(C,\eta) = \{ (C',\eta[z\mapsto e]) \: \mid \: C \xrightarrow{\eta(\mathbf{x}),\overline{\eta(\mathbf{y})}$$

We can now illustrate the construction of the NPA for a formula and a pes.

**Definition 10 (NPA for a formula).** *Let* <sup>E</sup> *be a bounded* pes *and let* <sup>ϕ</sup> ∈ Lhp *be a closed formula. The NPA for* E *and* ϕ *is* AE,ϕ = Q, −→, q0, F *defined* *as follows. The set of states* <sup>Q</sup> ⊆ C(E) <sup>×</sup> *Env*<sup>E</sup> <sup>×</sup> sf(ϕ) *is* <sup>Q</sup> <sup>=</sup> {(C, η, ψ) <sup>|</sup> η(*fv*(ψ)) ⊆ C}*. The initial state* q<sup>0</sup> = (∅, η,ϕ)*, for some chosen* η ∈ *Env*<sup>E</sup> *. The transition relation is defined, for any state* q = (C, η, ψ) ∈ Q*, by:*


*The acceptance condition is* <sup>F</sup> = (F0,...,Fh) *where* <sup>h</sup> <sup>=</sup> ad(ϕ)+1 *and the* <sup>F</sup><sup>i</sup> *are as follows. Consider* <sup>A</sup>0,...,A<sup>h</sup> <sup>⊆</sup> sf(ϕ) *such that for* <sup>i</sup> <sup>∈</sup> [0, h]*, if* <sup>i</sup> *is even (odd) then* A<sup>i</sup> *contains exactly all propositions quantified in* ν*-subformulae (*μ*-subformulae) with alternation depth* i *or* i − 1*. Then* F<sup>0</sup> = (C(E) × *Env*<sup>E</sup> × (A<sup>0</sup> ∪ {T})) <sup>∪</sup> <sup>B</sup> *where* <sup>B</sup> <sup>=</sup> {(C, η, [[**x**, **<sup>y</sup>** <sup>&</sup>lt; <sup>a</sup> <sup>z</sup>]]ψ) <sup>|</sup> Succ**x**,**y**<a<sup>z</sup>(C, η) = ∅} *is the set of all subformulae of* ϕ *in a context where they are trivially true, and* F<sup>i</sup> = C(E) × *Env*<sup>E</sup> × Ai*, for* i ∈ [1, h]*.*

States of AE,ϕ are triples (C, η, ϕ) consisting of a configuration C, an environment η and a subformula ψ of the original formula ϕ. The intuition is that a transition reduces the satisfaction of a formula in a state to that of subformulae in possibly updated states. It can just decompose the formula, as it happens for ∧ or ∨, check the satisfaction of a modal operator, thus changing the state consequently, or unfold a fixpoint.

The automaton AE,ϕ is bounded but normally infinite (whenever the pes E is infinite and the formula ϕ includes some non-trivial fixpoint).

We next show that for a strongly regular pes the satisfaction of the formula ϕ on the pes E reduces to the non-emptiness of the language of AE,ϕ.

**Theorem 1 (model checking via non-emptiness).** *Let* <sup>E</sup> *be a strongly regular* pes *and let* ϕˇ *be a closed formula in* Lhp*. Then L*(AE,ϕˇ) = ∅ *iff* E |= ˇϕ*.*

We next provide an outline of the proof. A basic ingredient is an equivalence that can be defined on the NPA. As a first step we introduce a generalised notion of residual in which the relation with some selected events in the past is kept.

**Definition 11 (pointed residual).** *Given a* pes <sup>E</sup> *and a set* <sup>X</sup>*, a* <sup>X</sup>-pointed configuration *is a pair* C, ζ *where* C ∈ C(E) *and* ζ : X → C *is a function. We say that the* X*-pointed configurations* C, ζ*,* C , ζ *have isomorphic pointed residuals, written* E[C, ζ] ≈ E[C , ζ ] *if there is an isomorphism of* pes*s* ι : E[C] → E[C ] *such that for all* x ∈ X*,* e ∈ E[C] *we have* ζ(x) ≤ e *iff* ζ (x) ≤ ι(e)*.*

Then two states are deemed equivalent if they involve the same subformula (up to renaming of the event variables) and the configurations, pointed by the free variables in the formulae, have isomorphic residuals. This resembles the notion of contextualised equivalence used on tableau judgments in [26].

**Definition 12 (future equivalence).** *Let* <sup>E</sup> *be a* pes*,* <sup>ϕ</sup> *be a formula and let* q<sup>i</sup> = (Ci, ηi, ψi)*,* i ∈ {1, 2} *be two states of the NPA* AE,ϕ*. We say that* q<sup>1</sup> *and* q<sup>2</sup> *are* future equivalent*, written* q<sup>1</sup> ≈<sup>f</sup> q2*, if there exists a formula* ψ *and substitutions* σ<sup>i</sup> : *fv*(ψ) → *fv*(ψi) *such that* ψσ<sup>i</sup> = ψi*, for* i ∈ {1, 2}*, and the fv*(ψ)*-pointed configurations* Ci, η<sup>i</sup> ◦ σ<sup>i</sup> *have isomorphic pointed residuals.*

It can be shown that, given q<sup>i</sup> = (Ci, ηi, ψi), i ∈ {1, 2} as above, for all proposition environments π (satisfying a technical property of saturation) we have that (C1, η1) ∈ {|ψ1|}<sup>E</sup> <sup>π</sup> if and only if (C2, η2) ∈ {|ψ2|}<sup>E</sup> <sup>π</sup>. Additionally, using strong regularity, one can prove that the semantics of fixpoint formulae is properly captured by finite approximants and that equivalence ≈<sup>f</sup> is of finite index. These are fundamental building bricks in the proof of Theorem 1 which, roughly, proceeds as follows.

Assume that the language *L*(AE,ϕ) = ∅. Then there is an accepting run r over some k-tree T . Since ϕ is finite, in each infinite path there are infinitely many states q<sup>i</sup>*<sup>h</sup>* = (C<sup>i</sup>*<sup>h</sup>* , η<sup>i</sup>*<sup>h</sup>* , ψ<sup>i</sup>*<sup>h</sup>* ) where ψ<sup>i</sup>*<sup>h</sup>* is the same subformula, up to renaming. Since ≈<sup>f</sup> is of finite index, infinitely many such states are equivalent. Then one deduces that, for some h, the subformula ψ<sup>i</sup>*<sup>h</sup>* is satisfied in (C<sup>i</sup>*<sup>h</sup>* , η<sup>i</sup>*<sup>h</sup>* ). For fixpoint subformulae, this requires to show that, since the run is accepting, the subformula of maximal alternation depth that repeats infinitely often is a νformula and use the fact that, as mentioned before, its semantics can be finitely approximated. Then, by a form of backward soundness of the transitions, we get that all the nodes, including the root, contain formulae which are satisfied.

For the converse implication, assume that E |= ϕ. Starting from the initial state q<sup>0</sup> = (∅, η,ϕ) where the formula is satisfied, and using the automaton transitions, we can build a k-tree T and a run where for each state (C , η , ψ) the subformula ψ is satisfied in (C , η ) and such run can be proved to be accepting.

#### **3.3 Quotienting the Automaton**

In order to have an effective procedure for checking the satisfaction of a formula we need to build a suitable quotient of the NPA, with respect to an equivalence which preserves emptiness. A simple but important observation is that it is sufficient to require that the equivalence is a bisimulation in the following sense. An analogous notion is studied in [31] in the setting of nondeterministic tree automata over finite trees.

**Definition 13 (bisimulation).** *Given an NPA* <sup>A</sup>*, a symmetric relation* <sup>R</sup> <sup>⊆</sup> Q × Q *over the set of states is a bisimulation if for all* (q, q ) ∈ R

*1. for all* i ∈ [0, h]*,* q ∈ F<sup>i</sup> ⇐⇒ q ∈ Fi*; 2. if* q −→ (q1,...,qm) *then* q −→ (q 1,...,q <sup>m</sup>) *with* (qi, q <sup>i</sup>) ∈ R *for* i ∈ [1, m]*.*

Given an NPA A and an equivalence ≡ on the set of states which is a bisimulation, we define the quotient as A*/*<sup>≡</sup> = Q*/*≡, −→*/*≡, [q0]≡, F*/*≡ where [q]≡−→*/*≡([q1]≡,..., [qm]≡) if q −→ (q1,...,qm) and F*/*<sup>≡</sup> = (F<sup>0</sup>*/*≡,...,Fh*/*≡). An NPA and its quotient accept exactly the same language.

**Theorem 2 (language preservation).** *Let* <sup>A</sup> *be an NPA and let* <sup>≡</sup> *be an equivalence on the set of states which is a bisimulation. Then L*(A*/*≡) = *L*(A)*.*

When ≡ is of finite index, the quotient AE,ϕ*/*<sup>≡</sup> is finite and, exploiting Theorems 1 and 2, we can verify whether E |= ϕ by checking the emptiness of the language accepted by AE,ϕ*/*≡. Clearly a concrete algorithm will not first generate the infinite state NPA and then take the quotient, but it rather performs the quotient on the fly: whenever a new state would be equivalent to one already generated, the transition loops back to the existing state.

Whenever E is strongly regular, the future equivalence on states (see Definition 12) provides a bisimulation equivalence of finite index over AE,ϕ.

**Lemma 1 (**≈<sup>f</sup> **is a bisimulation).** *Let* <sup>E</sup> *be a strongly regular* pes *and let* ϕ *be a closed formula in* Lhp*. Then the future equivalence* ≈<sup>f</sup> *on* AE,ϕ *is a bisimulation and it is of finite index.*

An obstacle towards the use of the quotiented NPA for model checking purposes is the fact that the future equivalence could be hard to compute (or even undecidable). In order to make the construction effective we need a decidable bisimulation equivalence on the NPA and the effectiveness of the set of successors of a state. This is further discussed in the next section.

### **4 Model Checking Petri Nets**

We show how the model checking approach outlined before can be instantiated on finite safe Petri nets, a classical model of concurrency and distribution [32], by identifying a suitable effective bisimulation equivalence on the NPA.

#### **4.1 Petri Nets and Their Event Structure Semantics**

A *Petri net* is a tuple N = (P, T, F,M0) where P, T are disjoint sets of *places* and *transitions*, respectively, F : (P ×T)∪(T ×P) → {0, 1} is the *flow function*, and M<sup>0</sup> is the initial marking, i.e., the initial state of the net. We assume that the set of transitions is a subset of a fixed set <sup>T</sup> with a labelling <sup>λ</sup><sup>N</sup> : <sup>T</sup> <sup>→</sup> <sup>Λ</sup>.

<sup>A</sup> *marking* of <sup>N</sup> is a function <sup>M</sup> : <sup>P</sup> <sup>→</sup> <sup>N</sup>, indicating for each place the number of tokens in the place. A transition t ∈ T is *enabled* at a marking M if M(p) ≥ F(p, t) for all p ∈ P. In this case it can be *fired* leading to a new marking M defined by M (p) = M(p) + F(t, p) − F(p, t) for all places p ∈ P. This is written M[tM . We denote by R(N ) the set of markings reachable in N via a sequence of firings starting from the initial marking. We say that a marking M is *coverable* if there exists M ∈ R(N ) such that M ≤ M , pointwise. A net N is *safe* if for every reachable marking M ∈ R(N ) and all p ∈ P we have M(p) ≤ 1. Hereafter we will consider only safe nets. Hence markings will be often confused with the corresponding subset of places {p | M(p)=1} ⊆ P. For x ∈ P ∪ T the *pre-set* and *post-set* are defined •x = {y ∈ P ∪ T | F(y, x)=1} and x• = {y ∈ P ∪ T | F(x, y)=1} respectively.

An example of Petri net can be found in Fig. 1b. Graphically places and transitions are drawn as circles and rectangles, respectively, while the flow function is rendered by means of directed arcs connecting places and transitions. Markings are represented by inserting tokens (black dots) in the corresponding places.

The concurrent behaviour of a Petri net can be represented by its unfolding U(N ), an acyclic net constructed inductively starting from the initial marking of N and then adding, at each step, an occurrence of each enabled transition.

**Definition 14 (unfolding).** *Let* <sup>N</sup> = (P, T, F,m0) *be a safe net. Define the net* <sup>U</sup>(0) = (P(0), T(0), F(0)) *as* <sup>T</sup>(0) <sup>=</sup> <sup>∅</sup>*,* <sup>P</sup>(0) <sup>=</sup> {(p, <sup>⊥</sup>) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>m</sup>0} *and* <sup>F</sup>(0) <sup>=</sup> <sup>∅</sup>*, where* ⊥ *is an element not belonging to* P*,* T *or* F*. The unfolding is the least net* <sup>U</sup>(<sup>N</sup> )=(P(ω) , T(ω) , F(ω) ) *containing* U(0) *and such that*


Places and transitions in the unfolding represent tokens and firing of transitions, respectively, of the original net. The projection π<sup>1</sup> over the first component maps places and transitions of the unfolding to the corresponding items of the original net N . The initial marking is implicitly identified as the set of minimal places. For historical reasons transitions and places in the unfolding are also called *events* and *conditions*, respectively.

One can define *causality* ≤<sup>N</sup> over the unfolding as the transitive closure of the flow relation. *Conflict* is the relation e#e if •e ∩ •e = ∅, inherited along causality. The events T(ω) of the unfolding of a finite safe net, endowed with causality and conflict, form a pes, denoted E(N ). The transitions of a configuration C ∈ C(E(N )) can be fired in any order compatible with causality, producing a marking <sup>C</sup>◦ = (P(0) <sup>∪</sup> - <sup>t</sup>∈<sup>C</sup> t •) \ ( - <sup>t</sup>∈<sup>C</sup> •t) in <sup>U</sup>(<sup>N</sup> ); in turn, this corresponds to a reachable marking of <sup>N</sup> given by <sup>M</sup>(C) = <sup>π</sup>1(C◦). As an example, the unfolding U(N ) of the running example net N and the corresponding pes can be found in Figs. 1c and a.

#### **4.2 Automata Model Checking for Petri Nets**

The pes associated with a safe Petri net is known to be regular [27]. We next prove that it is also strongly regular and thus we can apply the theory developed so far for model checking Lhp over safe Petri nets.

Let N = S, T, F, M0 be a safe Petri net. A basic observation is that the residual of the pes E(N ) with respect to a configuration C ∈ C(E(N )) is uniquely determined by the marking produced by C. This correspondence can be extended to pointed configurations by considering markings which additionally record, for the events of interest in the past, the places in the marking which are caused by such events. This motivates the definition below.

**Definition 15 (pointed marking).** *Let* <sup>N</sup> <sup>=</sup> S, T, F, M0 *be a safe Petri net. Given a set* <sup>X</sup>*, a* <sup>X</sup>-pointed marking *is a pair* M,r *with* <sup>r</sup> : <sup>X</sup> <sup>→</sup> <sup>2</sup>M*.*

<sup>A</sup> <sup>X</sup>-pointed configuration C, ζ induces an <sup>X</sup>-pointed marking <sup>M</sup>(C, ζ) = M(C), r where <sup>r</sup>(x) = {π1(b) <sup>|</sup> <sup>b</sup> <sup>∈</sup> <sup>C</sup>◦ <sup>∧</sup> <sup>ζ</sup>(x) < b}. Pointed configurations producing the same pointed marking have isomorphic pointed residuals.

**Proposition 1 (pointed markings vs residuals).** *Let* <sup>N</sup> <sup>=</sup> S, T, F, M0 *be a safe Petri net. Given a set* X *and two* X*-pointed configurations* C1, ζ1*,* C2, ζ2 *in* <sup>U</sup>(<sup>N</sup> )*, if* <sup>M</sup>(C1, ζ1) = <sup>M</sup>(C2, ζ2) *then* <sup>E</sup>(<sup>N</sup> )[C1, ζ1] ≈ E(<sup>N</sup> )[C2, ζ2]*.*

By the previous result the pes associated with a finite safe Petri net is strongly regular. Indeed, the number of residuals of X-pointed configurations, up to isomorphism, by Proposition 1, is smaller than the number of X-pointed markings, which is clearly finite since the net is safe.

**Corollary 1 (strong regularity).** *Let* <sup>N</sup> *be finite safe Petri net. Then the corresponding* pes E(N ) *is strongly regular.*

In order to instantiate the model checking framework to finite safe Petri nets, the idea is to take an equivalence over the infinite NPA by abstracting the (pointed) configurations associated with its states to pointed markings.

**Definition 16 (pointed-marking equivalence on NPA).** *Let* <sup>N</sup> *be a finite safe Petri net and let* ϕ *be a closed formula in* Lhp*. Two states* q1*,* q<sup>2</sup> *in the NPA* AE(N),ϕ *are* pointed-marking equivalent*, written* q<sup>1</sup> ≈<sup>m</sup> q2*, if* q<sup>i</sup> = Ci, ηi, ψ*,* <sup>i</sup> ∈ {1, <sup>2</sup>}*, for some* <sup>ψ</sup> <sup>∈</sup> sf(ϕ) *and* <sup>M</sup>(C1, η1|*fv*(ψ)) = <sup>M</sup>(C2, η2|*fv*(ψ))*.*

Using Proposition 1 we can immediately prove that ≈<sup>m</sup> refines ≈<sup>f</sup> . Moreover we can show that ≈<sup>m</sup> is a bisimulation in the sense of Definition 13.

**Proposition 2 (marking equivalence is a bisimulation).** *Let* <sup>N</sup> *be a finite safe Petri net and let* ϕ *be a closed formula in* Lhp*. The equivalence* ≈<sup>m</sup> *on the automaton* AE(N),ϕ *is a bisimulation and it is of finite index.*

Relying on Propositions 1 and 2 we provide an explicit construction of the quotient automaton <sup>A</sup>E(N),ϕ*/*≈*<sup>m</sup>* . We introduce a convenient notation for transitions between pointed markings. Given the variables **x**, **y**, a set X such that **<sup>x</sup>**∪**<sup>y</sup>** <sup>⊆</sup> <sup>X</sup> and an <sup>X</sup>-pointed marking M,r, we write M,r **<sup>x</sup>**,**<sup>y</sup>** < t −−−−→a,z M , r if M[tM , <sup>λ</sup><sup>N</sup> (t) = <sup>a</sup>, for all <sup>x</sup> <sup>∈</sup> **<sup>x</sup>** we have <sup>r</sup>(x) <sup>∩</sup> •<sup>t</sup> <sup>=</sup> <sup>∅</sup> and for all <sup>y</sup> <sup>∈</sup> **<sup>y</sup>** it holds r(y)∩ •t = ∅ and r is defined by r (z) = t • and r (w)=(r(w)∩M )∪ {s | r(w) ∩ •t = ∅ ∧ s ∈ t •}, for w = z. In words, from the pointed marking M,r transition t is fired and "pointed" by variable z. Transition t is required to consume tokens caused by **x** and not to consume tokens caused by **y**, in order to be itself caused by **x** and independent from **y**. After the firing, variables which were causes of some p ∈ •t become causes of the places in t • and, clearly, z causes t •. **Construction 1 (quotient NPA).** *Let* <sup>N</sup> *be a finite safe Petri net and let* <sup>ϕ</sup> ∈ Lhp *be a closed formula. The quotient NPA* <sup>A</sup>E(N),ϕ*/*≈*<sup>m</sup> is defined as follows. The set of states* <sup>Q</sup> <sup>=</sup> {(M, r, ψ) <sup>|</sup> <sup>M</sup> ∈ R(<sup>N</sup> ) <sup>∧</sup> <sup>r</sup> : *fv*(ψ) <sup>→</sup> <sup>2</sup><sup>M</sup> <sup>∧</sup> <sup>ψ</sup> <sup>∈</sup> sf(ϕ)}*. The initial state* q<sup>0</sup> = (M0, ∅, ϕ)*. The transition relation is defined, for any state* q = (M, r, ψ) ∈ Q*, by:*


*The acceptance condition is as in Definition 10.*

#### **4.3 A Prototype Tool**

The algorithm for model checking Petri nets outlined before is implemented in the prototype tool TCWB (*True Concurrency Workbench*) [33], written in Haskell. The tool inputs a safe Petri net N and a closed formula ϕ of Lhp and outputs the truth value of the formula on the initial marking of N . The algorithm builds the quotient NPA <sup>A</sup>E(N),ϕ*/*≈*<sup>m</sup>* "on demand", i.e., the states of the automaton are generated when they are explored in the search of an accepting run. A path is recognised as successful when it includes a loop where a <sup>∗</sup> <sup>d</sup>-maximal subformula is T, a [[ ]]-subformula or a ν-subformula. In this way only the fragment of <sup>A</sup>E(N),ϕ*/*≈*<sup>m</sup>* relevant to decide the satisfaction of ϕ is built.

Given a net N = (P, T, F,M0) and a formula ϕ, the number of states in the quotient automaton <sup>A</sup>E(N),ϕ*/*≈*<sup>m</sup>* can be bounded as follows. Recall that a state consists of a triple (M, r, ψ) where <sup>ψ</sup> <sup>∈</sup> sf(ϕ), <sup>M</sup> is a reachable marking and <sup>r</sup> : *fv*(ψ) <sup>→</sup> <sup>2</sup><sup>M</sup> is a function. This leads to an upper bound <sup>O</sup>(|sf(ϕ)|·|R(<sup>N</sup> )|·2|<sup>P</sup> |·<sup>v</sup>), where <sup>v</sup> <sup>=</sup> *max*{|*fv*(ψ)<sup>|</sup> : <sup>ψ</sup> <sup>∈</sup> sf(ϕ)} is the largest number of event variables appearing free in a subformula of <sup>ϕ</sup>. In turn, since |R(<sup>N</sup> )| ≤ <sup>2</sup>|<sup>P</sup> <sup>|</sup> , this is bounded by <sup>O</sup>(|sf(ϕ)|·2|<sup>P</sup> |·(v+1)). The size of the automaton is thus exponential in the size of the net and linear in the size of the formula. Moving from the interleaving fragment of the logic (where v = 0) to formulae capable of expressing true concurrent properties thus causes an exponential blow up. However, note that the worst case scenario requires all transitions to be related by causality and concurrency to all places in any possible way, something that should be quite unlikely in practice. Indeed, despite the fact that the tool is very preliminary and more tweaks and optimisations could improve its efficiency, for the practical tests we performed the execution time seems to be typically well below than the theoretical worst case upper bound.

### **5 Conclusions**

We introduced an automata-theoretic framework for the model checking of the logic for true concurrency Lhp, representing the logical counterpart of a classical true concurrent equivalence, i.e., history preserving bisimilarity. The approach is developed abstractly for strongly regular pess, that include regular trace pess. A concrete model-checking procedure requires the identification of an effective bisimulation equivalence for the construction of the quotient automaton. We showed how this can be done for finite safe Petri nets. The technique is implemented in a proof-of-concept tool.

We proved that the class of regular trace pess is included in that of strongly regular pess which in turn is included in the class of regular pess. The precise relation of strongly regular pess with the other two classes is still unclear and interesting in view of [34] that recently showed that regular trace pess are strictly included in regular pess, disproving Thiagarajan's conjecture.

Several other papers deal with model checking for logics on event structures. In [35] a technique is proposed for model checking a CTL-style logic with modalities for immediate causality and conflict on a subclass of pess. The logic is quite different from ours as formulae are satisfied by single events, the idea being that an event, with its causes, represents the local state of a component. The procedure involves the construction of a finite representation of the pes associated with a program which has some conceptual relation with our quotienting phase. In [19] the author shows that first order logic and Monadic Trace Logic (MTL), a restricted form of monadic second order (MSO) logic are decidable on regular trace event structures. The possibility of directly observing conflicts in MTL and thus of distinguishing behaviourally equivalent pess (e.g., the pess consisting of a single or two conflicting copies of an event), and the presence in Lhp of propositions which are non-monadic with respect to event variables, make these logics not immediate to compare. Still, a deeper investigation is definitively worth to pursue, especially in view of the fact that, in the propositional case, the mucalculus corresponds to the bisimulation invariant fragment of MSO logic [36].

The work summarised in [18] develops a game theoretic approach for modelchecking a concurrent logic over partial order models. It has been observed in [20] that such logic is incomparable to Lhp. Preliminary investigations shows that our model-checking framework could be adapted to such a logic and, more generally, to a logic joining the expressive power of the two. Moreover, further exploring the potentialities of a game theoretic approach in our setting represents an interesting venue of further research.

Compared to our previous work [26], we extended the range of the technique to the full logic Lhp, without limitations concerning the alternation depth of formulae. Relaxing the restriction to strongly regular pess, instead, appears to be quite problematic unless one is willing to deal with transfinite runs which, however, would be of very limited practical interest.

The tool is still very preliminary. As suggested by its (wishful) name (inspired by the classical Edinburgh Concurrency Workbench [37]) we would like to bring the TCWB to a more mature stage, working on optimisations and adding an interface that gives access to a richer set of commands.

**Acknowledgements.** We are grateful to Perdita Stevens for insightful hints and pointers to the literature and to the anonymous reviewers for their comments.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Theory of Encodings and Expressiveness (Extended Abstract)**

Rob van Glabbeek1,2(B)

<sup>1</sup> Data61, CSIRO, Sydney, Australia rvg@cs.stanford.edu <sup>2</sup> Computer Science and Engineering, University of New South Wales, Sydney, Australia

**Abstract.** This paper proposes a definition of what it means for one system description language to encode another one, thereby enabling an ordering of system description languages with respect to expressive power. I compare the proposed definition with other definitions of encoding and expressiveness found in the literature, and illustrate it on a wellknown case study: the encoding of the synchronous in the asynchronous π-calculus.

### **1 Introduction**

This paper, like [16,21], aims at answering the question what it means for one language to encode another one, and making the resulting definition applicable to order system description languages like CCS, CSP and the π-calculus with respect to their expressive power.

To this end it proposes a unifying concept of valid translation between two languages *up to* a semantic equivalence or preorder. It applies to languages whose semantics interprets the operators and recursion constructs as operations on a set of values, called a *domain*. Languages can be partially ordered by their expressiveness up to the chosen equivalence or preorder according to the existence of valid translations between them.

The concept of a [valid] translation between system description languages (or *process calculi*) was first formally defined by Boudol [3]. There, and in most other related work in this area, the domain in which a system description language is interpreted consists of the closed expressions from the language itself. In [14] I have reformulated Boudol's definition, while dropping the requirement that the domain of interpretation is the set of closed terms. This allows (but does not enforce) a clear separation of syntax and semantics, in the tradition of universal algebra. Nevertheless, the definition employed in [14] only deals with the case that all (relevant) elements in the domain are denotable as the interpretations of closed terms. In [16] situations are described where such a restriction is undesirable. In addition, both [3,14] require the semantic equivalence ∼ under which two languages are compared to be a congruence for both of them. This is too severe a restriction to capture many recent encodings [1,2,7,30,31,33,38,43].

In [16] I alleviated these two restrictions by proposing two notions of encoding: *correct* and *valid* translations up to ∼. Each of them generalises the proposals of [3,14]. The former drops the restriction on denotability as well as ∼ being a congruence for the whole target language, but it requires ∼ to be a congruence for the source language, as well as for the source's image within the target. The latter drops both congruence requirements (and allows ∼ to be a preorder rather than an equivalence), but at the expense of requiring denotability by closed terms. In situations where ∼ is a congruence for the source language's image within the target language *and* all semantic values are denotable, the two notions agree.

The current paper further generalises the work of [16] by proposing a new notion of a valid translation that incorporates the correct and valid translations of [16] as special cases. It drops the congruence requirements as well as the restriction on denotability.

As in [16], my aim is to generalise the concept of a valid translation as much as possible, so that it is uniformly applicable in many situations, and not just in the world of process calculi. Also, it needs to be equally applicable to encodability and separation results, the latter saying that an encoding of one language in another does not exists. At the same time, I try to derive this concept from a unifying principle, rather than collecting a set of criteria that justify a number of known encodability and separation results that are intuitively justified.

*Overview of the Paper.* Section 2 defines my new concept of a valid translation up to a semantic equivalence or preorder •∼. Roughly, a valid translation of one language into another is a mapping from the expressions in the first language to those in the second that preserves their meaning, i.e. such that the meaning of a translated expression is semantically equivalent to the meaning of the original.

Section 3 shows that this concept generalises the notion of a correct translation from [16]: a translation is correct up to a semantic equivalence ∼ iff it is valid up to ∼ and ∼ is a congruence for the source language as well as for the image of the source language within the target language.

Likewise, [18]—the full version of this paper—establishes the coincidence of my validity-based notion of expressiveness with the one from [16] when applying both to languages for which all semantic values are denotable by closed terms.

One language is said to be at least as expressive as another up to •∼ iff there exists a valid translation up to •∼ of the latter language into the former. Section 4 shows that "being at least as expressive as" is a preorder on languages. This expressiveness preorder depends on the choice of •∼, and a coarser choice (making less distinctions) yields a richer preorder of expressiveness inclusions.

Section 6 illustrates the framework on a well-known case study: the encoding of the synchronous in the asynchronous π-calculus.

Section 7 discusses the *congruence closure* of a semantic equivalence for a given language, and remarks that in the presence of operators with infinite arity it is not always a congruence. Section 8 states a useful congruence closure property for valid translations: if a translation between two languages exists that is valid up a semantic equivalence ∼, then it is even valid up to an equivalence that


Section 9 concludes that the framework established thus far is great for comparing the expressiveness of languages, but falls short for the purpose of combining language features. This requires a congruence reflection theorem, provided in Sect. 12, for languages satisfying postulates formulated in Sects. 5, 10 and 11.

Section 12 defines when a translation is *compositional*, and shows that any valid translation up to •∼ can be modified into a compositional translation valid up to •∼. This requires restricting attention to languages and preorders •∼ that satisfy some mild sanity requirements—the postulates of Sects. 10 and 11. Hence, for the purpose of comparing the expressive power of languages, valid translations between them may be presumed compositional.

Section 13 compares my approach with the one of Gorla [21], and concludes. Omitted proofs and counterexamples (marked by ¶) can be found in [18].

### **2 Languages, Valid Translations, and Expressiveness**

A language consists of *syntax* and *semantics*. The syntax determines the valid expressions in the language. The semantics is given by a mapping [ ] that associates with each valid expression its meaning, which can for instance be an object, concept or statement.

Following [16], I represent a language <sup>L</sup> as a pair (T<sup>L</sup>,[ ]L) of a set <sup>T</sup><sup>L</sup> of valid expressions in <sup>L</sup> and a mapping [ ]<sup>L</sup> : T<sup>L</sup> → D<sup>L</sup> from T<sup>L</sup> in some set of meanings DL.

**Definition 1 (**[16]**).** <sup>A</sup> *translation* from a language <sup>L</sup> into a language <sup>L</sup> is a mapping T : T<sup>L</sup> → TL-.

In this paper, I consider single-sorted languages L in which *expressions* or *terms* are built from variables (taken from a set X ) by means of operators (including constants) and possibly recursion constructs. For such languages the meaning [E]<sup>L</sup> of an <sup>L</sup>-expression <sup>E</sup> is a function of type (X →**V**)→**<sup>V</sup>** for a given sets of *values* **<sup>V</sup>**. It associates a value [E]L(ρ)∈**<sup>V</sup>** to <sup>E</sup> that depends on the choice of a *valuation* ρ: X→**V**. The valuation associates a value from **<sup>V</sup>** with each variable.

Since normally the names of variables are irrelevant and the cardinality of the set of variables satisfies only the requirement that it is "sufficiently large", no generality is lost by insisting that two (system description) languages whose expressiveness is being compared employ the same set of (process) variables. On the other hand, two languages L and L may be interpreted in different domains of values **V** and **V** .

Let L and L be languages as considered above, with semantic mappings

$$\left[\begin{array}{c} \mathbb{I} \end{array}\right]\_{\mathcal{L}} \colon \mathbb{T}\_{\mathcal{L}} \to ((\mathcal{X} \to \mathbf{V}) \to \mathbf{V}) \quad \text{and} \quad \left[\begin{array}{c} \mathbb{I} \end{array}\right]\_{\mathcal{L}'} \colon \mathbb{T}\_{\mathcal{L}'} \to ((\mathcal{X} \to \mathbf{V}') \to \mathbf{V}'). $$

In order to compare these languages w.r.t. their expressive power I need a semantic equivalence or preorder •∼ that is defined on a unifying domain of interpretation **<sup>Z</sup>**, with **<sup>V</sup>**, **<sup>V</sup>** <sup>⊆</sup> **<sup>Z</sup>**. <sup>1</sup> Intuitively, *<sup>v</sup>* •<sup>∼</sup> *<sup>v</sup>* with *<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** and *<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** means that values *v* and *v* are sufficiently alike for our purposes, so that one can accept a translation of an expression with meaning *v* into an expression with meaning *v* . Below, target values of a translation (in **V** ) are written on the left.

*Correct* and a *valid* translations up to a semantic equivalence or preorder •∼ were introduced in [16]. Here I redefine these concepts in terms of a new concept of *correctness w.r.t. a semantic translation*.

**Definition 2.** Let **<sup>V</sup>** and **<sup>V</sup>** be domains of values in which two languages <sup>L</sup> and <sup>L</sup> are interpreted. A *semantic translation* from **<sup>V</sup>** into **<sup>V</sup>** is a relation **<sup>R</sup>** <sup>⊆</sup> **<sup>V</sup>** <sup>×</sup> **<sup>V</sup>** such that <sup>∀</sup>*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>**. <sup>∃</sup>*v* <sup>∈</sup> **<sup>V</sup>** . *v* **R***v*.

Thus every semantic value in **V** needs to have a counterpart in **V** —possibly multiple ones. For valuations η : X → **<sup>V</sup>** , ρ : X → **<sup>V</sup>** I write η **<sup>R</sup>** ρ iff η(X) **<sup>R</sup>** ρ(X) for each X ∈ X .

**Definition 3.** A translation <sup>T</sup> : T<sup>L</sup> <sup>→</sup> TL is *correct* w.r.t. a semantic translation **<sup>R</sup>** if [<sup>T</sup> (E)]L- (η) **<sup>R</sup>** [E]L(ρ) for all expressions <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and all valuations η : X → **<sup>V</sup>** and ρ : X → **<sup>V</sup>** with η **<sup>R</sup>** ρ.

Thus <sup>T</sup> is correct iff the meaning of the translation of an expression E is a counterpart of the meaning of E, no matter what values are filled in for the variables, provided that the value filled in for a given variable X occurring in the translation <sup>T</sup> (E) is a counterpart of the value filled in for X in E.

**Definition 4.** A translation <sup>T</sup> : T<sup>L</sup> <sup>→</sup> TL is *correct* up to •∼ iff •∼ is an equivalence, the restriction **<sup>R</sup>** of •<sup>∼</sup> to **<sup>V</sup>** <sup>×</sup> **<sup>V</sup>** is a semantic translation, and <sup>T</sup> is correct w.r.t. **R**.

**Definition 5.** A translation <sup>T</sup> is *valid* up to •<sup>∼</sup> iff it is correct w.r.t. some semantic translation **<sup>R</sup>** <sup>⊆</sup> •∼. Language <sup>L</sup> is at least as *expressive* as <sup>L</sup> up to •<sup>∼</sup> if a translation valid up to •∼ from L into L exists.

Example 4 in [18] illustrates both notions and shows their difference.

<sup>1</sup> I will be chiefly interested in the case that •<sup>∼</sup> is an equivalence—hence the choice of a symbol that looks like ∼. However, to establish Observation 2 and Theorem 2 below, it suffices to know that •∼ is reflexive and transitive. My convention is that the dotted end of •∼ points to a translation and the other end to an original—without offering an intuition for the possible asymmetry.

### **3 Correct = Valid + Congruence**

In [16] the concept of a correct translation up to ∼ was defined, for ∼ a semantic equivalence on **<sup>Z</sup>**. Here two valuations η, ρ : X → **<sup>Z</sup>** are called <sup>∼</sup>-*equivalent*, η <sup>∼</sup> ρ, if η(X) <sup>∼</sup> ρ(X) for each X ∈ X . In case there exists a *<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** for which there is no <sup>∼</sup>-equivalent *<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** , there is no correct translation from L into L up to ∼. Namely, the semantics of L describes, among others, how any L-operator evaluates the argument value *v*, and this aspect of the language has no counterpart in L . Therefore, [16] requires

$$\forall v \in \mathbf{V}. \; \exists v' \in \mathbf{V}'. \; v' \sim v. \tag{1}$$

This implies that for any valuation <sup>ρ</sup> : X → **<sup>V</sup>** there is an <sup>η</sup> : X → **<sup>V</sup>** with η <sup>∼</sup> ρ.

**Definition 6 (**[16]**).** A translation <sup>T</sup> from <sup>L</sup> into <sup>L</sup> is *correct up to* <sup>∼</sup> iff (1) holds and [<sup>T</sup> (E)]L- (η) <sup>∼</sup> [E]L(ρ) for all <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and all valuations <sup>η</sup> : X → **<sup>V</sup>** and ρ : X → **<sup>V</sup>** with η <sup>∼</sup> ρ.

Note that this definition agrees completely with Definition 4. Requirement (1) above corresponds to **R** being a semantic translation in Definition 4.

If a correct translation up to ∼ from L into L exists, then ∼ must be a congruence for L.

**Definition 7.** An equivalence relation <sup>∼</sup> is a *congruence* for a language <sup>L</sup> interpreted in a semantic domain **<sup>V</sup>** if [E]L(ν) <sup>∼</sup> [E]L(ρ) for any <sup>L</sup>-expression <sup>E</sup> and any valuations ν, ρ: X → **<sup>V</sup>** with ν <sup>∼</sup> ρ. 2

**Proposition 1 (**[16]**).**If <sup>T</sup> is a correct translation up to <sup>∼</sup> from <sup>L</sup> into <sup>L</sup> , then ∼ is a congruence for L.

The existence of a correct translation up to ∼ from L into L does not imply that ∼ is a congruence for L . However, ∼ has the properties of a congruence for those expressions of L that arise as translations of expressions of L, when restricting attention to valuations into **<sup>U</sup>** := {*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** | ∃*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>**. *<sup>v</sup>* <sup>∼</sup> *<sup>v</sup>*}. In [16] this called a *congruence for* T (L).

**Definition 8.** Let <sup>T</sup> : T<sup>L</sup> <sup>→</sup> TL be a translation from L into L . An equivalence <sup>∼</sup> on **<sup>V</sup>** is a *congruence for* <sup>T</sup> (L) if [<sup>T</sup> (E)]L- (θ) <sup>∼</sup> [<sup>T</sup> (E)]L- (η) for any <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and θ, η :X→**<sup>U</sup>** with <sup>θ</sup> <sup>∼</sup> <sup>η</sup>.

**Proposition 2 (**[16]**).** If <sup>T</sup> is a correct translation up to <sup>∼</sup> from <sup>L</sup> into <sup>L</sup> , then ∼ is a congruence for T (L).

The following theorem tells that the notion of validity proposed in Sect. 2 can be seen as a generalisation of the notion of correctness from [16] that applies to equivalences (and preorders) •∼ that need not be congruences for L or T (L).

**Theorem 1.** A translation <sup>T</sup> from <sup>L</sup> into <sup>L</sup> is correct up to a semantic equivalence ∼ iff it is valid up to ∼ and ∼ is a congruence for T (L). ¶

<sup>2</sup> This is called a *lean* congruence in [17]; in the presence of recursion, stricter congruence requirements are common. Those are not needed in this paper.

### **4 A Hierarchy of Expressiveness Preorders**

An equivalence or preorder •<sup>∼</sup> on a class **<sup>Z</sup>** is said to be *finer*, *stronger*, or *more discriminating* than another equivalence or preorder •<sup>≈</sup> on **<sup>Z</sup>** if *<sup>v</sup>* •<sup>∼</sup> *<sup>w</sup>* <sup>⇒</sup> *<sup>v</sup>* •<sup>≈</sup> *<sup>w</sup>* for all *<sup>v</sup>*,*<sup>w</sup>* <sup>∈</sup> **<sup>Z</sup>**.

**Observation 1.** Let <sup>T</sup> : T<sup>L</sup> <sup>→</sup> TL be a translation from L into L , and let •∼ be finer than •≈. If T is valid up to •∼, then it is also valid up to •≈.

The quality of a translation depends on the choice of the equivalence or preorder up to which it is valid. Any two languages are equally expressive up to the universal equivalence, relating any two processes. Hence, the equivalence or preorder needs to be chosen carefully to match the intended applications of the languages under comparison. In general, as shown by Observation 1, using a finer equivalence or preorder yields a stronger claim that one language can be encoded in another. On the other hand, when separating two languages L and L by showing that L *cannot* be encoded in L , a coarser equivalence yields a stronger claim.

**Observation 2.** The identity is a valid translation up to any preorder from any language into itself.

**Theorem 2.** If valid translations up to •<sup>∼</sup> exists from <sup>L</sup><sup>1</sup> into <sup>L</sup><sup>2</sup> and from <sup>L</sup><sup>2</sup> into L3, then there is a valid translation up to •∼ from L<sup>1</sup> into L3. ¶

Theorem 2 and Observation 2 show that the relation "being at least as expressive as up to •∼" is a preorder on languages.

### **5 Closed-Term Languages**

The languages considered in this paper feature *variables*, *operators* of *arity* n∈IN, and/or other constructs. The set T<sup>L</sup> of L-expressions is inductively defined by:


Examples of other constructs are the infinite summation operator - <sup>i</sup>∈<sup>I</sup> <sup>E</sup><sup>i</sup> of CCS, which takes arbitrary many arguments, or the recursion construct μX.E, that has one argument, but *binds* all occurrences of X in that argument.

In general a construct has a number (possibly infinite) of argument expressions and it may bind certain variables within some of its arguments—the *scope* of the binding. An occurrence of a variable X in an expression is *bound* if it occurs within the scope of a construct that binds X, and *free* otherwise.

The semantics of such a language is given, in part, by a domain of values **<sup>V</sup>**, and an interpretation of each n-ary operator f of <sup>L</sup> as an n-ary operation f**<sup>V</sup>** : **<sup>V</sup>**<sup>n</sup> <sup>→</sup> **<sup>V</sup>** on **<sup>V</sup>**. Using the equations

$$\left[\left[X\right]\_{\mathcal{L}}(\rho) = \rho(X) \quad \text{and} \quad \left[f(E\_1, \dots, E\_n)\right]\_{\mathcal{L}}(\rho) = f^{\mathbf{V}}(\left[E\_1\right]\_{\mathcal{L}}(\rho), \dots, \left[E\_n\right]\_{\mathcal{L}}(\rho))$$

this allows an inductive definition of the meaning [E]<sup>L</sup> of an <sup>L</sup>-expression <sup>E</sup>. Moreover, [E]L(ρ) only depends on the restriction of <sup>ρ</sup> to the set *fv*(E) of variables occurring free in E.

The set T<sup>L</sup> <sup>⊆</sup> <sup>T</sup><sup>L</sup> of *closed terms* of <sup>L</sup> consists of those <sup>L</sup>-expressions <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> with *fv*(E) = <sup>∅</sup>. If <sup>P</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and **<sup>V</sup>** <sup>=</sup> <sup>∅</sup> then [P]L(ρ) is independent of the choice of ρ : X → **<sup>V</sup>**, and therefore denoted [P]L.

**Definition 9.** <sup>A</sup> *substitution* in <sup>L</sup> is a partial function <sup>σ</sup> : <sup>X</sup> <sup>T</sup><sup>L</sup> from the variables to the <sup>L</sup>-expressions. For a given <sup>L</sup>-expression <sup>E</sup> <sup>∈</sup> <sup>T</sup>L, <sup>E</sup>[σ] <sup>∈</sup> <sup>T</sup><sup>L</sup> denotes the <sup>L</sup>-expression E in which each free occurrence of a variable X <sup>∈</sup> *dom*(σ) is replaced by σ(X), while renaming bound variables in E so as to avoid a free variable Y occurring in an expression σ(X) ending up being bound in E[σ]. A substitution is *closed* if it has the form σ : X → <sup>T</sup>L.

An important class of languages used in concurrency theory are the ones where the distinction between syntax and semantic is effectively dropped by taking **<sup>V</sup>** = TL, i.e. where the domain of values where the language is interpreted in consists of the closed terms of the language. Here a valuation is the same as a closed substitution, and [E]L(ρ) for <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and <sup>ρ</sup> : X → <sup>T</sup><sup>L</sup> is defined to be <sup>E</sup>[ρ] <sup>∈</sup> <sup>T</sup>L. I will call such languages *closed-term* languages.

### **6 Translating a Synchronous into an Asynchronous** *π*

As an illustration of the concepts introduced above, consider the π-calculus as presented in [28], i.e., the one of [44] without matching, τ -prefixing, and choice.

Given a set of *names* <sup>N</sup> , the set <sup>T</sup><sup>π</sup> of *process expressions* or *terms* <sup>E</sup> of the calculus is given by

$$E ::= X \quad | \quad \mathbf{0} \quad | \quad \bar{x}y.E \quad | \quad x(z).E \quad | \quad E|E' \quad | \quad (\nu z)E \quad | \quad !E$$

with x, y, z ranging over <sup>N</sup> , and X over <sup>X</sup> , the set of *process variables*. Process variables are not considered in [44], although they are common in languages like CCS [27] that feature a recursion construct. Since process variables form a central part of my notion of a valid or correct translation, here they have simply been added. This works generally. In Sect. 12 I show that for the purpose of accessing whether one language is as expressive as another, translations between them can be assumed to be compositional. This important result would be lost if process variables were dropped from the language. In that case compositionality would need to be stated as a separate requirement for valid translations.

Closed process expressions are called *processes*. The π-calculus is usually presented as a closed-term language, in that the semantic value associated with a closed term is simply itself. Yet, the real semantics is given by a reduction relation between processes, defined below.

**Definition 10.** An occurrence of a name <sup>z</sup> in <sup>π</sup>-calculus process <sup>P</sup> <sup>∈</sup> <sup>T</sup><sup>π</sup> is *bound* if it occurs within a subexpression x(z).P or (νz)P of <sup>P</sup>; otherwise it is *free*. Let <sup>n</sup>(P) (resp. bn(P)) be the set of names occurring (bound) in P <sup>∈</sup> Tπ. *Structural congruence*, ≡, is the smallest congruence relation on processes satisfying

$$\begin{array}{ccccc} P\_1|(P\_2|P\_3) \equiv (P\_1|P\_2)|P\_3 & !P \equiv \, P|!P \qquad (\nu w)(P|Q) \equiv \, P|(\nu w)Q \\\ P\_1|P\_2 \equiv \, P\_2|P\_1 & (\nu z)\mathbf{0} \equiv \, \mathbf{0} & x(z)\boldsymbol{P} \equiv x(w)\boldsymbol{P}\{w|z\} \\\ P|\mathbf{0} \equiv \, P & (\nu z)(\nu w)P \equiv (\nu w)(\nu z)P & (\nu z)P \equiv (\nu w)P\{w|z\} \end{array}$$

Here the rightmost column only holds when w /<sup>∈</sup> <sup>n</sup>(P), and P{w/z} denotes the process obtained by replacing each free occurrence of z in P by w.

**Definition 11.** The *reduction relation*, → ⊆ <sup>T</sup><sup>π</sup> <sup>×</sup> <sup>T</sup>π, is generated by the following rules.

$$\begin{array}{c c c} \hline & \overline{\bar{x}z.P|x(y).Q \to P|Q\{z\}y\} \\ \hline P \to P' & P \to P' \\ \hline P|Q \to P'|Q & (\nu z)P \to (\nu z)P' \\ \hline \end{array} \qquad \begin{array}{c c c c} (z \notin \mathsf{bn}(Q)) \\ \hline \end{array} \qquad \begin{array}{c c c c} (z \notin \mathsf{bn}(Q)) \\ \hline \end{array}$$

Let =⇒ be the reflexive and transitive closure of →. The observable behaviour of π-calculus processes is often stated in terms of the outputs they can produce (abstracting from the value communicated on an output channel).

**Definition 12.** Let x ∈ N . A process P has a *strong output barb* on x, notation P↓x¯, if P can perform an output action ¯xz. This is defined inductively:

$$(\bar{x}z.(P))\downarrow\_{\bar{x}} \qquad \frac{P\downarrow\_{\bar{x}}}{(P|Q)\downarrow\_{\bar{x}}} \qquad \frac{Q\downarrow\_{\bar{x}}}{(P|Q)\downarrow\_{\bar{x}}} \qquad \frac{P\downarrow\_{\bar{x}} \quad x \neq z}{((\nu z)P)\downarrow\_{\bar{x}}} \qquad \frac{P\downarrow\_{\bar{x}}}{(|P|)\downarrow\_{\bar{x}}}$$

A process P has a *weak output barb* on x, P⇓x¯, if there is a P with P <sup>=</sup><sup>⇒</sup> P ⇓x¯.

A common semantic equivalence applied in the π-calculus is *weak barbed congruence* [29,44].

**Definition 13.** *Weak (output) barbed bisimilarity* is the largest symmetric relation • ≈ ⊆ T<sup>π</sup> × T<sup>π</sup> such that

$$\begin{array}{l} \mathsf{I}-\mathsf{P}\triangleq Q \text{ and } P\downarrow\_{\overline{x}} \text{ implies } Q\Downarrow\_{\overline{x}}, \text{ and} \\\mathsf{I}-\mathsf{P}\triangleq Q \text{ and } P \Longrightarrow P' \text{ implies } Q \Longrightarrow Q' \text{ for some } Q' \text{ with } P' \not\approx Q'. \end{array}$$

*Weak barbed congruence*, ∼=<sup>c</sup>, is the largest congruence included in • ≈.

Often *input barbs*, defined similarly, are included in the definition of weak barbed bisimilarity [44]. This is known to induce the same notion of weak barbed congruence [44]. Another technique for defining weak barbed congruence is to use a barb, or set of barbs, external to the language under investigation, that are added to the language as constants [21], similar to the theory of testing of [9]. This method is useful for languages with a reduction semantics that do not feature a clear notion of barb, or where there is ambiguity in which barbs should be counted and which not, or for comparing languages with different kinds of barb. **Example 1.** xz. ¯ **<sup>0</sup>** <sup>∼</sup>=<sup>c</sup> (νu)(¯xu.**0**|u(v).vz. ¯ **<sup>0</sup>**). For let E := X|x(u).uv. ¯ **<sup>0</sup>** with ρ(X)=¯xz.**<sup>0</sup>** and ζ(X)=(νu)(¯xu.**0**|u(v).vz. ¯ **<sup>0</sup>**). Then E[ζ] <sup>→</sup> (νu) u(v).vz. ¯ **<sup>0</sup>**|uv. ¯ **<sup>0</sup>** <sup>→</sup> (¯vz.**0**)↓v¯ but (E[ρ])⇓v¯.

The asynchronous π-calculus, as introduced by Honda and Tokoro in [24] and by Boudol in [4], is the sublanguage aπ of the fragment π of the π-calculus presented above where all subexpressions ¯xy.E have the form ¯xy.**0**. *Asynchronous barbed congruence*, ∼=<sup>c</sup> <sup>a</sup>, is the largest congruence *for the asynchronous* π-*calculus* included in • <sup>≈</sup>. Since aπ is a sublanguage of π, <sup>∼</sup>=<sup>c</sup> <sup>a</sup> is at least as coarse an equivalence as <sup>∼</sup>=<sup>c</sup>, i.e. <sup>∼</sup>=<sup>c</sup> <sup>⊆</sup> <sup>∼</sup>=<sup>c</sup> <sup>a</sup>. The inclusion is strict, since !x(z).xz. ¯ **<sup>0</sup>** <sup>∼</sup>=<sup>c</sup> <sup>a</sup> **0**, yet !x(z).xz. ¯ **<sup>0</sup>** <sup>∼</sup>=<sup>c</sup> **<sup>0</sup>** [44]. Since all expressions used in Example <sup>1</sup> belong to aπ, one even has ¯xz.**<sup>0</sup>** ∼=c <sup>a</sup> (νu)(¯xu.**0**|u(v).vz. ¯ **<sup>0</sup>**).

Boudol [4] defined a translation <sup>T</sup> from π to aπ inductively as follows:

$$\begin{array}{lcl} \mathcal{T}(X) = X & \text{for } X \in \mathcal{X} \\ \mathcal{T}(\mathbf{0}) = \mathbf{0} \\ \mathcal{T}(\bar{x}z.P) = (u)(\bar{x}u|u(v).(\bar{v}z|\mathcal{T}(P))) \text{ choosing } u, v \notin \mathfrak{n}(P), \ u \neq v \\ \mathcal{T}(x(y).P) = x(u).(v)(\bar{u}v|v(y).\mathcal{T}(P)) \text{ choosing } u, v \notin \mathfrak{n}(P), \ u \neq v \\ \mathcal{T}(P|Q) = (\mathcal{T}(P)|\mathcal{T}(Q)) \\ \mathcal{T}(!P) = !\mathcal{T}(P) \\ \mathcal{T}((\nu x)P) = (\nu x)\mathcal{T}(P) \end{array}$$

Example <sup>1</sup> shows that <sup>T</sup> is not valid up to <sup>∼</sup>=<sup>c</sup>. In fact, it is not even valid up to <sup>∼</sup>=<sup>c</sup> <sup>a</sup>. However, as shown in [25], it is valid up to • ≈. Since • ≈ is not a congruence (for π or aπ) it is not correct up to • ≈.

### **7 Congruence Closure**

**Definition 14.** An equivalence relation <sup>∼</sup> is a *1-hole congruence* for a language <sup>L</sup> interpreted in a semantic domain **<sup>V</sup>** if [E]L(ν) <sup>∼</sup> [E]L(ρ) for any <sup>L</sup>-expression E and any valuations ν, ρ : X → **<sup>V</sup>** with ν <sup>∼</sup><sup>1</sup> <sup>ρ</sup>. Here ν, ρ are <sup>∼</sup><sup>1</sup>-*equivalent*, ν <sup>∼</sup><sup>1</sup> <sup>ρ</sup>, if <sup>ν</sup>(X) <sup>∼</sup> <sup>ρ</sup>(X) for some <sup>X</sup> ∈ X and <sup>ν</sup>(<sup>Y</sup> ) = <sup>ρ</sup>(<sup>Y</sup> ) for all variables <sup>Y</sup> <sup>=</sup> <sup>X</sup>.

An n-*hole congruence* for any finite n <sup>∈</sup> IN can be defined in the same vain, and it is well known and easy to check that a 1-hole congruence <sup>∼</sup> is also an n-hole congruence, for any n <sup>∈</sup> IN. However, in the presence of operators with infinitely many arguments, a 1-hole congruence need not be a congruence.

**Example 2.** Let **<sup>V</sup>** be (IN <sup>×</sup> IN) ∪ {∞}, with the well-order <sup>≤</sup> on **<sup>V</sup>** inherited lexicographically from the default order on IN and ∞ the largest element. So (n, m) <sup>≤</sup> (n , m ) iff n <sup>≤</sup> n <sup>∨</sup> (n <sup>=</sup> m <sup>∧</sup> m <sup>≤</sup> m ). Consider the language L with constants 0, 1 and (1), interpreted in **<sup>V</sup>** as (0, 0), (1, 0) and (0, 1), respectively, the binary operator +, interpreted by (n<sup>1</sup>, m<sup>1</sup>) +**<sup>V</sup>** (n<sup>2</sup>, m<sup>2</sup>)=(n<sup>1</sup>+n<sup>2</sup>, m<sup>1</sup>+m<sup>2</sup>) and <sup>∞</sup>+<sup>E</sup> <sup>=</sup> <sup>E</sup>+<sup>∞</sup> <sup>=</sup> <sup>∞</sup>, and the construct sup(E<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> that takes any number of arguments (dependent on the set of the index sets I). The interpretation of sup in **<sup>V</sup>** is to take the supremum of its arguments w.r.t. the well-order <sup>≤</sup>. In case sup is given finitely many arguments, it simply returns the largest. However sup((n, i))i∈IN = (n+1, 0).

Now let the equivalence relation <sup>∼</sup> on **<sup>V</sup>** be defined by (n, m) <sup>∼</sup> (n , m ) iff n <sup>=</sup> n , leaving ∞ in an equivalence class of its own. This relation is a 1-hole congruence on L. Hence, it is also a 2-hole congruence, so one has

$$\left( (n\_1, m\_1) \sim (n\_1', m\_1') \land (n\_2, m\_2) \sim (n\_2', m\_2') \right) \Rightarrow (n\_1, m\_1) + (n\_2, m\_2) \sim (n\_1', m\_1') + (n\_2', m\_2').$$

Yet it fails to be a congruence: (n, i) <sup>∼</sup> (n, 0) for all i <sup>∈</sup> IN, but

(n+1, 0) = sup((n, i))<sup>i</sup>∈IN <sup>∼</sup> sup((n, 0))<sup>i</sup>∈IN = (n, 0).

It is well known and easy to check that the collection of equivalence relations on any domain **V**, ordered by inclusion, forms a complete lattice—namely the intersection of arbitrary many equivalence relations is again an equivalence relation. Likewise, the collection of 1-hole congruences for L is also a complete lattice, and moreover a complete sublattice of the complete lattice of equivalence relations on **<sup>V</sup>**. The latter implies that for any collection C of 1-hole congruence relations, the least equivalence relation that contains all elements of C (exists and) happens to be a 1-hole congruence relation. Again, this is a property that is well known [22] and easy to prove. It follows that for any equivalence relation ∼ there exists a largest 1-hole congruence for L contained in ∼. I will denote this 1-hole congruence by <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup> , and call it the *congruence closure* of <sup>∼</sup> w.r.t. <sup>L</sup>. One has *<sup>v</sup>*<sup>1</sup> <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup> *<sup>v</sup>*<sup>2</sup> for *<sup>v</sup>*1, *<sup>v</sup>*<sup>2</sup> <sup>∈</sup> **<sup>V</sup>** iff [E]L(ν) <sup>∼</sup> [E]L(ρ) for any <sup>L</sup>-expression <sup>E</sup> and any valuations ν, ρ : X → **<sup>V</sup>** with <sup>ν</sup>(X) = *<sup>v</sup>*<sup>1</sup> and <sup>ρ</sup>(X) = *<sup>v</sup>*<sup>2</sup> for some <sup>X</sup> ∈ X and ν(Y ) = ρ(Y ) for all Y <sup>=</sup> X. Such results do not generally hold for congruences.

**Example 3.** Continue Example 2, but skipping the operator +. Let <sup>∼</sup><sup>k</sup> be the equivalence on **<sup>V</sup>** defined by (n, m) <sup>∼</sup><sup>k</sup> (n , m ) iff n <sup>=</sup> n <sup>∧</sup>(m <sup>=</sup> m <sup>∨</sup>m, m <sup>≤</sup> k). It is easy to check that all <sup>∼</sup><sup>k</sup> for <sup>k</sup> <sup>∈</sup> IN are congruences on the reduced <sup>L</sup>, and contained in ∼. Yet their least upper bound (in the lattice of equivalence relations on **<sup>V</sup>**) is <sup>∼</sup>, which is not a congruence itself. In particular, there is no largest congruence contained in ∼.

When dealing with languages L in which all operators and other constructs have a finite arity, so that each <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> contains only finitely many variables, there is no difference between a congruence and a 1-hole congruence, and thus <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup> is a congruence relation for any equivalence <sup>∼</sup>. I will apply the theory of expressiveness presented in this paper also to languages like CCS that have operators (such as - <sup>i</sup>∈<sup>I</sup> <sup>E</sup><sup>i</sup>) of infinite arity. However, in all such cases I'm currently aware of, the relevant choices of <sup>L</sup> and <sup>∼</sup> have the property that <sup>∼</sup><sup>1</sup><sup>c</sup> L is in fact a congruence relation. As an example, consider weak bisimilarity [27]. This equivalence relation fails to be a congruence for -. However, the coarsest 1 hole congruence contained in this relation, often called *rooted* weak bisimilarity, happens to be a congruence. In fact, when congruence-closing weak bisimilarity w.r.t. the binary sum, the result [15] is also a congruence for the infinitary sum, as well as for all other operators of CCS [27].

**Definition 15.** Let <sup>T</sup> be a translation from <sup>L</sup> into <sup>L</sup> . A subset **W** of **V** is *closed* under <sup>T</sup> (L) if [<sup>T</sup> (E)](η) <sup>∈</sup> **<sup>W</sup>** for any expression <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and valuation η : X → **<sup>W</sup>**. An equivalence <sup>∼</sup> on **<sup>W</sup>** is a *congruence* (respectively *1-hole congruence*) for <sup>T</sup> (L) on **<sup>W</sup>** if for any <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and θ, η : X → **<sup>W</sup>** with <sup>θ</sup> <sup>∼</sup> <sup>η</sup> (respectively θ <sup>∼</sup><sup>1</sup> η) one has [<sup>T</sup> (E)]L- (θ) <sup>∼</sup> [<sup>T</sup> (E)]L-(η).

**Proposition 3.** Let <sup>T</sup> be a translation from <sup>L</sup> into <sup>L</sup> that is correct w.r.t. a semantic translation **<sup>R</sup>** <sup>⊆</sup> **<sup>V</sup>** <sup>×</sup>**V**. Let **<sup>R</sup>**(**V**) := {*v* <sup>∈</sup> **<sup>V</sup>** | ∃*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>**. *<sup>v</sup>* **<sup>R</sup>***v*}. Then **<sup>R</sup>**(**V**) is closed under <sup>T</sup> (L).

**Proof:** Let <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> and <sup>η</sup> : X → **<sup>R</sup>**(**V**). Take <sup>ρ</sup> : X → **<sup>V</sup>** with <sup>ρ</sup>**R**η. Then [<sup>T</sup> (E)]L- (η)**R**[E]L(ρ). Since [E]L(ρ) <sup>∈</sup> **<sup>V</sup>** one has [<sup>T</sup> (E)]L-(η) <sup>∈</sup> **<sup>R</sup>**(**V**).

**Proposition 4.** Let the translation <sup>T</sup> from <sup>L</sup> into <sup>L</sup> be correct w.r.t. the semantic translation **<sup>R</sup>** ⊆ ∼. Then <sup>∼</sup> is a (1-hole) congruence for <sup>L</sup> iff it is a (1-hole) congruence for <sup>T</sup> (L) on **<sup>R</sup>**(**V**).

**Proof:** First suppose <sup>∼</sup> is a congruence for <sup>L</sup>. Let <sup>E</sup> <sup>∈</sup>T<sup>L</sup> and θ, η : X → **<sup>R</sup>**(**V**) with θ <sup>∼</sup> η. By the definition of **<sup>R</sup>**(**V**) there are valuations ν, ρ : X → **<sup>V</sup>** with θ **<sup>R</sup>** ν and η **<sup>R</sup>** ρ. Now ν <sup>∼</sup> θ <sup>∼</sup> η <sup>∼</sup> ρ, so

$$\mathbb{E}\left[\mathcal{T}(E)\right]\_{\mathcal{L}'}(\theta)\mathbf{R}\left[E\right]\_{\mathcal{L}}(\nu) \sim \left[E\right]\_{\mathcal{L}}(\rho)\mathbf{R}^{-1}\left[\mathcal{T}(E)\right]\_{\mathcal{L}'}(\eta)$$

and hence [<sup>T</sup> (E)]L- (θ) <sup>∼</sup> [<sup>T</sup> (E)]L- (η). The other direction proceeds in the same way.

Now suppose <sup>∼</sup> is a 1-hole congruence for <sup>L</sup>. Let <sup>E</sup>∈T<sup>L</sup> and θ, η : X → **<sup>R</sup>**(**V**) with θ <sup>∼</sup><sup>1</sup> <sup>η</sup>. Then <sup>θ</sup>(X) <sup>∼</sup> <sup>η</sup>(X) for some <sup>X</sup> ∈ X and <sup>θ</sup>(<sup>Y</sup> ) = <sup>η</sup>(<sup>Y</sup> ) for all Y <sup>=</sup> X. So there must be ν, ρ : X → **<sup>V</sup>** with θ **<sup>R</sup>** ν, η **<sup>R</sup>** ρ and ν(Y ) = ρ(Y ) for all Y <sup>=</sup> X. Since ν(X) <sup>∼</sup> θ(X) <sup>∼</sup> η(X) <sup>∼</sup> ρ(X) it follows that ν <sup>∼</sup><sup>1</sup> <sup>ρ</sup>. The conclusion proceeds as above, and the other direction goes likewise.

The requirement of being a congruence for <sup>T</sup> (L) on **<sup>R</sup>**(**V**) is slightly weaker than that of being a congruence for T (L)—cf. Definition 8—for it proceeds by restricting attention to valuations into **<sup>R</sup>**(**V**) <sup>⊆</sup> **<sup>U</sup>**. ¶

### **8 A Congruence Closure Property for Valid Translations**

In many applications, semantic values in the domain of interpretation of a language <sup>L</sup> are only meaningful up to a semantic equivalence <sup>∼</sup><sup>c</sup>, and the intended semantic domain could just as well be seen as the set of <sup>∼</sup><sup>c</sup>-equivalence classes of values. For this purpose it is essential that <sup>∼</sup><sup>c</sup> is a congruence for <sup>L</sup>. Often <sup>∼</sup><sup>c</sup> is the congruence closure of a coarser semantic equivalence ∼, so that two values end up being identified iff they are ∼-equivalent in every context. An example of this occurred in Sect. 6, with • <sup>≈</sup> in the rˆole of <sup>∼</sup> and <sup>∼</sup>=<sup>c</sup> in the rˆole of <sup>∼</sup><sup>c</sup>. Now Theorem 4, contributed in this section, says that if a translation from L into L is valid up to <sup>∼</sup>, then it is even valid up to an equivalence <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** that extends <sup>∼</sup><sup>c</sup> from **V** to a subdomain **W** of **V** that suffices for the interpretation of translated expressions from <sup>L</sup>. This equivalence <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** coincides with the congruence closure of <sup>∼</sup> on <sup>L</sup>, as well as on <sup>T</sup> (L), and melts each equivalence class of **<sup>V</sup>** with exactly one of **W**, and vice versa.

Let <sup>L</sup> and <sup>L</sup> be languages with [ ]<sup>L</sup> : <sup>T</sup><sup>L</sup> <sup>→</sup> ((X → **<sup>V</sup>**) <sup>→</sup> **<sup>V</sup>**) and [ ]L- : TL- <sup>→</sup> ((X → **<sup>V</sup>** ) <sup>→</sup> **<sup>V</sup>** ). In this section I assume that **<sup>V</sup>** <sup>∩</sup> **<sup>V</sup>** <sup>=</sup> <sup>∅</sup>. To apply the results to the general case, just adapt <sup>L</sup> by using a copy of **<sup>V</sup>** —any preorder •<sup>∼</sup> on **<sup>V</sup>** <sup>∪</sup> **<sup>V</sup>** extends to this copy by considering each copied element •∼-equivalent to the original.

**Definition 16.** Given any semantic translation **<sup>R</sup>**, let <sup>≡</sup>**<sup>R</sup>** <sup>⊆</sup> (**<sup>V</sup>** <sup>∪</sup> **<sup>V</sup>** )<sup>2</sup> be the smallest equivalence relation on **<sup>V</sup>** <sup>∪</sup> **<sup>V</sup>** containing **<sup>R</sup>**.

**Theorem 3.** If a translation <sup>T</sup> is correct w.r.t. the semantic translation **<sup>R</sup>**, then ≡**<sup>R</sup>** is a 1-hole congruence for L. ¶

By Proposition <sup>4</sup> <sup>≡</sup>**<sup>R</sup>** also is a 1-hole congruence for <sup>T</sup> (L) on **<sup>R</sup>**(**V**). Only the subset **<sup>R</sup>**(**V**) of **<sup>V</sup>** matters for the purpose of translating <sup>L</sup> into <sup>L</sup> . On **V** \**R**(**V**) the equivalence ≡**<sup>R</sup>** is the identity.

**Theorem 4.** Let <sup>T</sup> be a translation from a language <sup>L</sup>, with semantic domain **<sup>V</sup>**, into a language <sup>L</sup> , with domain **V** , that is valid up to a semantic equivalence <sup>∼</sup>. Then <sup>T</sup> is even valid up to a semantic equivalence <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>**, contained in <sup>∼</sup>, such that (1) the restriction of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** to **<sup>V</sup>** is the largest 1-hole congruence for <sup>L</sup> contained in <sup>∼</sup>, (2) the set **<sup>W</sup>** := {*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>** | ∃*<sup>v</sup>* <sup>∈</sup> **<sup>V</sup>**. *<sup>v</sup>* <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** *<sup>v</sup>*} is closed under <sup>T</sup> (L), and (3) the restriction of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** to **<sup>W</sup>** is the largest 1-hole congruence for <sup>T</sup> (L) on **<sup>W</sup>** that is contained in <sup>∼</sup>. ¶

Note that each equivalence class of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** on **<sup>V</sup>** <sup>∪</sup>**<sup>W</sup>** melts an equivalence class of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** on **<sup>V</sup>** with one of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** on **<sup>W</sup>**. Moreover, on **<sup>V</sup>** the relation is completely determined by <sup>L</sup> and <sup>∼</sup>. However, in general the whole relation <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** is not completely determined by L and ∼. ¶

**Corollary 1.** Let <sup>T</sup> be a translation from a language <sup>L</sup>, with semantic domain **<sup>V</sup>**, into a language <sup>L</sup> , with domain **V** , valid up to a semantic equivalence ∼, and suppose the congruence closure <sup>∼</sup><sup>1</sup> <sup>L</sup> of <sup>∼</sup> w.r.t. <sup>L</sup> is in fact a congruence. Then <sup>T</sup> is correct up to the equivalence <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup>,**<sup>R</sup>** described in Theorem 4. ¶

The languages π and aπ of Sect. <sup>6</sup> do not feature operators (or other constructs) of infinite arity. Hence the congruence closure <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>π</sup> or <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>a</sup><sup>π</sup> of an equivalence <sup>∼</sup> on π or aπ is always a congruence. So by Corollary <sup>1</sup> Boudol's translation T is correct up to an equivalence • ≈c π,**<sup>R</sup>**, defined on the disjoint union of the domains T<sup>π</sup> and T<sup>a</sup><sup>π</sup> on which the two languages are interpreted. This equivalence is contained in • <sup>≈</sup>, and on the source domain T<sup>π</sup> coincides with <sup>∼</sup>=<sup>c</sup>. By Theorem 4, the restriction of • ≈c π,**<sup>R</sup>** to a subdomain **<sup>W</sup>** <sup>⊆</sup> <sup>T</sup><sup>a</sup><sup>π</sup> is the largest congruence for <sup>T</sup> (π) on **<sup>W</sup>** that is contained in <sup>∼</sup>. As <sup>∼</sup>=<sup>c</sup> <sup>a</sup> is a congruence for all of aπ on all of T<sup>a</sup>π, and contained in • <sup>≈</sup>, it is certainly a congruence for <sup>T</sup> (π) on **W**, and thus contained in • ≈c π,**<sup>R</sup>**. This inclusion turns out to be strict. As an illustration of that, note that ¯xz.**0**|xz. ¯ **<sup>0</sup>** <sup>∼</sup>=<sup>c</sup> xz. ¯ xz¯ **<sup>0</sup>**. (This follows since these processes are strong (early) bisimilar [44] and thus strong full bisimilar by [44, Definition 2.2.2].) Consequently, their translations must be related by • ≈c π,**R**. So, for distinct u, v, y, w, x, z ∈ N ,

$$\left( (u)(\bar{x}u|u(v).(\bar{v}z|\mathbf{0})) \Big| \begin{pmatrix} u \end{pmatrix} (\bar{x}u|u(v).(\bar{v}z|\mathbf{0})) \stackrel{\bullet}{\approx}\_{\pi,\mathbf{R}}^{c} \langle y \rangle (\bar{x}y|u(w).(\bar{u}z|(u)\langle \bar{x}u|u(v).(\bar{v}z|\mathbf{0}))) \rangle.$$

Yet, these processes are not ∼=<sup>c</sup> <sup>a</sup>-equivalent, as can be seen by putting them in a context x(y).x(y).r¯(s)|X. There, only the left-hand side has a weak barb ⇓r¯.

### **9 Integrating Language Features Through Translations**

The results of the previous section show how valid translations are satisfactory for comparing the expressiveness of languages. If there is a valid translation T from <sup>L</sup> to <sup>L</sup> up to <sup>∼</sup>, and (as usual) <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup> is a congruence, then all truths that can be expressed in terms of L can be mimicked in L . For the congruence classes of <sup>∼</sup><sup>1</sup><sup>c</sup> <sup>L</sup> translate bijectively to congruence classes of an induced equivalence relation on the domain of T (L) (within the domain of L ), and all operations on those congruence classes that can be performed by contexts of L have a perfect counterpart in terms of contexts of T (L). This state of affairs was illustrated on Boudol's translation from a synchronous to an asynchronous π-calculus.

There is however one desirable property of translations between languages that has not yet been achieved, namely to combine the powers of two languages into one unified language. If both languages L<sup>1</sup> and L<sup>2</sup> have valid translations into a language L , then all that can be done with L<sup>1</sup> can be mimicked in a fragment of L , and all that can be done with L<sup>2</sup> can be mimicked in another fragment of L . In order for these two fragments to combine, one would like to employ a single congruence relation on L that specialises to congruence relations for T1(L1) and T2(L2), which form the counterparts of relevant congruence relations for the source languages L<sup>1</sup> and L2.

In terms of the translation <sup>T</sup> from π to aπ, the equivalence <sup>∼</sup>=<sup>c</sup> <sup>a</sup> on T<sup>a</sup><sup>π</sup> would be the right congruence relation to consider for aπ. Ideally, this congruence would extend to an equivalence ∼=<sup>c</sup> π,a<sup>π</sup> on the disjoint union T<sup>π</sup> Taπ, such that the restriction of ∼=<sup>c</sup> π,a<sup>π</sup> to T<sup>π</sup> is a congruence for <sup>π</sup>. Necessarily, this congruence on T<sup>π</sup> would have to distinguish the terms ¯xz.**0**|xz. ¯ **<sup>0</sup>** and ¯xz.xz¯ **<sup>0</sup>**, since their translations are distinguished by ∼=<sup>c</sup> <sup>a</sup>. One therefore expects ∼=<sup>c</sup> π,a<sup>π</sup> on T<sup>π</sup> to be strictly finer than ∼=<sup>c</sup>. Here it is important that the union of T<sup>π</sup> and Taπ on which this congruence is defined is required to be disjoint. For if one considers Taπ as a subset of Tπ, then we obtain that the restriction of ∼=<sup>c</sup> π,a<sup>π</sup> to that subset (1) coincides with ∼=<sup>c</sup> <sup>a</sup> and (2) is strictly finer than ∼=<sup>c</sup>. This contradicts the fact that ∼=<sup>c</sup> is strictly finer than ∼=<sup>c</sup> a.

In Sect. 12 I will show that such a congruence ∼=<sup>c</sup> π,a<sup>π</sup> indeed exists. In fact, under a few very mild conditions this result holds generally, provided that the source language L is a closed-term language. ¶

### **10 A Unique Decomposition of Terms**

The results of Sect. 12 apply only to languages satisfying two postulates, formulated below, and to preorders •<sup>∼</sup> that "respect <sup>α</sup> =", defined in Sect. 11.

**Definition 17.** α-*conversion* is the act of renaming all occurrences of a bound variable X within the scope of its binding into another variable, say Y , while avoiding capture of free variables. Here one speaks of *capture* when a free occurrence of Y turns into a bound one.

Write E <sup>α</sup> <sup>=</sup> F if expression E can be converted into F by acts of α-conversion.

In languages where there are multiple types of bound variables, <sup>α</sup> = allows conversion of all of them. In a π-calculus with recursion, for instance, there could be bound process variables X ∈ X as well as bound names x ∈ N . The last two conversions in the right column of Definition <sup>10</sup> define α-conversion for names.

**Postulate 1 (**[16]**, paraphrased).** There exists a class of expressions called *standard heads*, and a class of substitutions called *standard substitutions*, such that for each expression E, if not a variable, there are unique standard heads H and substitutions σ such that E <sup>α</sup> <sup>=</sup> H[σ].

A term <sup>f</sup>(c, g(c)), for instance, can be written as <sup>H</sup>[σ] where <sup>H</sup> <sup>=</sup> <sup>f</sup>(X1, X<sup>2</sup>) is a head, and <sup>σ</sup> : {X1, X<sup>2</sup>} → <sup>T</sup><sup>L</sup> is given by <sup>σ</sup>(X<sup>1</sup>) = <sup>c</sup> and <sup>σ</sup>(X<sup>2</sup>) = <sup>g</sup>(c). The head H is standardised by means of a particular (arbitrary) choice for its argument variables <sup>X</sup><sup>1</sup> and <sup>X</sup><sup>2</sup>. <sup>σ</sup> is standardised through a particular choice of the bound variables that may occur in the expressions σ(X). A head for a recursive expression μX.f(g(c), g(g(X))) is μX.f(Y,g(g(X))). See [16] for further detail.

This postulate is easy to show for each common type of system description language, and I am not aware of any counterexamples. However, while striving for maximal generality, I consider languages with (recursion-like) constructs that are yet to be invented, and in view of those, this principle has to be postulated rather than derived.

### **11 Invariance of Meaning Under** *α***-conversion**

Write *v* <sup>α</sup> <sup>=</sup><sup>L</sup> *<sup>w</sup>*, with *<sup>v</sup>*,*<sup>w</sup>* <sup>∈</sup> **<sup>V</sup>**, iff there are terms E,F <sup>∈</sup> <sup>T</sup><sup>L</sup> with <sup>E</sup> <sup>α</sup> <sup>=</sup> F, and a valuation ζ : X → **<sup>V</sup>** such that [E]L(ζ) = *<sup>v</sup>* and [F]L(ζ) = *<sup>w</sup>*. This relation is reflexive and symmetric.

In [16] I limited attention to languages satisfying

$$\inf \; E \stackrel{\alpha}{=} \; F \; then \; \lbrack E \rceil\_{\mathcal{L}} = \lbrack F \rceil\_{\mathcal{L}}.\tag{2}$$

This postulate says that the meaning of an expression is invariant under αconversion. It can be reformulated as the requirement that <sup>α</sup> =<sup>L</sup> is the identity relation. This postulate is satisfied by all my intended applications, except for the important class of closed-term languages. Languages like CCS and the πcalculus can be regarded as falling in this class (although it is also possible to declare the meaning of a term under a valuation to be an <sup>α</sup> =-equivalence class of closed terms). To bring this type of application within the scope of my theory, here I weaken this postulate by requiring merely that <sup>α</sup> =<sup>L</sup> is an equivalence.

**Postulate 2.** <sup>α</sup> =<sup>L</sup> is an equivalence relation.

This postulate is needed in Sect. 12. I also need to restrict attention to preorders •<sup>∼</sup> with <sup>α</sup> <sup>=</sup><sup>L</sup> <sup>⊆</sup> •∼. When that holds I say that the preorder •<sup>∼</sup> *respects* <sup>α</sup> =L. If (2) holds—which strengthens of Postulate 2—then *any* preorder respects <sup>α</sup> =L.

### **12 Compositionality**

An important property of translations, defined below, is *compositionality*. In this section show I that any valid translation up to a preorder •∼ can be modified into such a translation that moreover is compositional, provided one restricts attention to languages that satisfy Postulates 1 and 2, and preorders •∼ that respect <sup>α</sup> =.

**Definition 18.** A translation <sup>T</sup> from <sup>L</sup> into <sup>L</sup> is *compositional* if


In case <sup>E</sup> <sup>=</sup> <sup>f</sup>(t1,...,t<sup>n</sup>) for certain <sup>t</sup><sup>i</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup> this amounts to <sup>T</sup> (f(t1,...,t<sup>n</sup>)) <sup>α</sup> <sup>=</sup> <sup>E</sup><sup>f</sup> (<sup>T</sup> (t<sup>1</sup>),..., <sup>T</sup> (t<sup>n</sup>)), where <sup>E</sup><sup>f</sup> := <sup>T</sup> (f(X1,...,X<sup>n</sup>)) and <sup>E</sup><sup>f</sup> (u1,...,u<sup>n</sup>) denotes the result of the simultaneous substitution in this expression of the terms <sup>u</sup><sup>i</sup> <sup>∈</sup> <sup>T</sup>L for the free variables X<sup>i</sup>, for i = 1,...,n. The first requirement of Definition 18 is more general and covers language constructs other than functions, such as recursion. Requiring equality rather than <sup>α</sup> = is too demanding. ¶

**Lemma 1.** If <sup>T</sup><sup>1</sup> : T<sup>L</sup><sup>1</sup> <sup>→</sup> T<sup>L</sup><sup>2</sup> and <sup>T</sup><sup>2</sup> : T<sup>L</sup><sup>2</sup> <sup>→</sup> T<sup>L</sup><sup>3</sup> are compositional translations, then so is their composition T<sup>2</sup> ◦ T<sup>1</sup> : T<sup>L</sup><sup>1</sup> → T<sup>L</sup><sup>3</sup> , defined by <sup>T</sup><sup>2</sup> ◦ T1(E) := <sup>T</sup>2(T1(E)) for all <sup>E</sup> ∈ L1.

**Proof:** (1) <sup>T</sup>2(T1(E[σ])) <sup>α</sup> <sup>=</sup> <sup>T</sup>2(T1(E)[T<sup>1</sup> ◦ <sup>σ</sup>]) <sup>α</sup> <sup>=</sup> <sup>T</sup>2(T1(E))[T<sup>2</sup> ◦ T<sup>1</sup> ◦ <sup>σ</sup>]) for each <sup>σ</sup> : <sup>X</sup> <sup>T</sup><sup>L</sup><sup>1</sup> and <sup>E</sup> <sup>∈</sup> <sup>T</sup><sup>L</sup><sup>1</sup> . Here the derivation of the first <sup>α</sup> = uses Property (2) of Definition 18—and this is the reason for requiring that property.

(2) E <sup>α</sup> <sup>=</sup>F implies <sup>T</sup>1(E) <sup>α</sup> <sup>=</sup>T1(F) and <sup>T</sup>2(T1(E)) <sup>α</sup> <sup>=</sup> <sup>T</sup>2(T1(F)) for all E,F <sup>∈</sup>TL. (3) <sup>T</sup>2(T1(X)) = <sup>T</sup>2(X) = <sup>X</sup> for each <sup>X</sup> ∈ X .

**Theorem 5.** Let <sup>L</sup> and <sup>L</sup> be languages that satisfy Postulates <sup>1</sup> and 2, and •<sup>∼</sup> a preorder that respects <sup>α</sup> <sup>=</sup><sup>L</sup> and <sup>α</sup> =L- . If any valid (or correct) translation from L into L up to •∼ exists, then there exists a compositional translation that is valid (or correct) up to •∼. ¶ Hence, for the purpose of comparing the expressive power of languages, valid translations between them can be assumed to be compositional. For correct translations this was already established in [16], but assuming (2), a stronger version of Postulate 2.

I can now establish the theorem promised in Sect. 9. In view of Theorem 5, no great sacrifices are made by assuming that the translation T is compositional. Other "mild conditions" needed are Postulate <sup>2</sup> for <sup>L</sup> and <sup>≈</sup> respecting <sup>α</sup> =L-.

**Theorem 6.** Let <sup>L</sup> be a closed-term language and <sup>L</sup> a language that satisfies Postulate 2. Let T be a compositional translation from L into L that is valid up to <sup>∼</sup>. Let <sup>≈</sup> be any congruence for <sup>L</sup> containing <sup>α</sup> =L and contained in ∼. Then <sup>T</sup> is correct up to an equivalence <sup>≈</sup><sup>T</sup> on **<sup>V</sup>** <sup>∪</sup> **<sup>V</sup>** , contained in ∼, that on **<sup>V</sup>** coincides with <sup>≈</sup>. ¶

### **13 Related Work**

The concept of *full abstraction* stems from Milner [26]. It indicates a satisfactory connection between a denotational and an operational semantics of a language. Riecke [42] and Shapiro [45] adapt this notion to translations between languages.

**Definition 19.** A translation <sup>T</sup> : T<sup>L</sup><sup>S</sup> <sup>→</sup> T<sup>L</sup><sup>T</sup> is *fully abstract* w.r.t. the equivalences <sup>∼</sup>S⊆T<sup>2</sup> <sup>L</sup><sup>S</sup> and <sup>∼</sup><sup>T</sup> <sup>⊆</sup>T<sup>2</sup> <sup>L</sup><sup>T</sup> if, for all P, Q <sup>∈</sup> <sup>T</sup><sup>L</sup><sup>S</sup> , <sup>P</sup> <sup>∼</sup><sup>S</sup> <sup>Q</sup> ⇔ T (P) <sup>∼</sup><sup>T</sup> <sup>T</sup> (Q).

In [42,45], ∼<sup>S</sup> and ∼<sup>T</sup> are required to be congruence closures—see [18] for more detail. The simplified definition above was used in [1,30,31]. Fu [10] bases a theory of expressiveness on full abstraction, with a divergence-preserving form of barbed branching bisimilarity [19] in the rˆole of ∼<sup>S</sup> and ∼T. A comparison of full abstraction with the approach of the present paper appears in [18].

In the last twenty years, a great number of encodability and separation results have appeared, comparing CCS, Mobile Ambients, and several versions of the π-calculus (with and without recursion; with mixed choice, separated choice or asynchronous) [1,2,5–8,11–13,23,30–34,38–41,43,46]; see [20,21] for an overview. Many of these results employ different and somewhat ad-hoc criteria on what constitutes a valid encoding, and thus are hard to compare with each other. Several of these criteria are discussed and compared in [35,36]. Gorla [21] collected some essential features of these approaches and integrated them in a proposal for a valid encoding that justifies most encodings and some separation results from the literature.

Like Boudol [3] and the present paper, Gorla requires a compositionality condition for encodings. However, his criterion is weaker than mine (cf. Definition 18) in that the expression <sup>E</sup><sup>f</sup> encoding an operator <sup>f</sup> may be dependent on the set of names occurring freely in the expressions given as arguments of f. This issue is further discussed in [16]. It is an interesting topic for future research to see if there are any valid encodability results `a la [21] that suffer from my proposed strengthening of compositionality.

The second criterion of [21] is a form of invariance under name-substitution. It serves to partially undo the effect of making the compositionality requirement name-dependent. In my setting I have not yet found the need for such a condition. In [16] I argue that this criterion as formalised in [21] is too restrictive.

The remaining three requirements of Gorla (the 'semantic' requirements) are very close to an instantiation of mine with a particular preorder •∼. If one takes •∼ to be weak barbed bisimilarity with explicit divergence (i.e. relating divergent states with divergent states only), using barbs external to the language, as discussed in Sect. 6, then an valid translation in my sense satisfies Gorla's semantic criteria, provided that the equivalence ≡ on the target language that acts as a parameter in Gorla's third criterion is also taken to be weak barbed bisimilarity with explicit divergence. The precise relationships between the proposals of [16,21] are further discussed in [37].

Further work is needed to sort out to what extent the two approaches have relevant differences when evaluating encoding and separation results from the literature. Another topic for future work is to sort out how dependent known encoding and separation results are on the chosen equivalence or preorder.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A Framework for Parameterized Monitorability**

Luca Aceto1,2 , Antonis Achilleos2(B) , Adrian Francalanza<sup>3</sup> , and Anna Ing´olfsd´ottir<sup>2</sup>

<sup>1</sup> Gran Sasso Science Institute, L'Aquila, Italy <sup>2</sup> School of Computer Science, Reykjavik University, Reykjavik, Iceland {luca,antonios,annai}@ru.is <sup>3</sup> Department of Computer Science, ICT, University of Malta, Msida, Malta adrian.francalanza@um.edu.mt

**Abstract.** We introduce a general framework for Runtime Verification, parameterized with respect to a set of conditions. These conditions are encoded in the trace generated by a monitored process, which a monitor can observe. We present this parameterized framework in its general form and prove that it corresponds to a fragment of HML with recursion, extended with these conditions. We then show how this framework can be applied to a number of instantiations of the set of conditions.

### **1 Introduction**

Runtime Verification (RV) is a lightweight verification technique that checks whether a system satisfies a correctness property by analysing the *current execution* of the system [20,29], expressed as a trace of execution events. Using the additional information obtained at runtime, the technique can often mitigate state explosion problems typically associated with more traditional verification techniques. At the same time, limiting the verification analysis to the current execution trace hinders the expressiveness of RV when compared to more exhaustive approaches. In fact, there are correctness properties that cannot be satisfactorily verified at runtime (*e.g.* the finiteness of the trace considered up to the current execution point prohibits the verification of liveness properties). Because of this reason, RV is often used as part of a multi-pronged approach towards ensuring system correctness [5,6,8,14,15,25], *complementing* other verification techniques such as model checking, testing and type checking.

In order to attain an effective verification strategy consisting of multiple verification techniques that include RV, it is crucial to understand the expressive power of each technique: one can then determine how to best decompose the verification burden into subtasks that can then be assigned to the most appropriate verification technique. *Monitorability* concerns itself with identifying the

This research was supported by the project "TheoFoMon: Theoretical Foundations for Monitorability" (grant number: 163406-051) of the Icelandic Research Fund.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 203–220, 2018. https://doi.org/10.1007/978-3-319-89366-2\_11

properties that are analysable by RV. In [21,22] (and subsequently in [2]), the problem of monitorability was studied for properties expressed in a variant of the modal μ-calculus [26] called μHML [28]. The choice of the logic was motivated by the fact that it can embed widely used logics such as CTL and LTL, and by the fact that it is agnostic of the underlying verification method used—this leads to better separation of concerns and guarantees a good level of generality for the results obtained. The main result in [2,21,22] is the identification of a monitorable syntactic subset of the logic μHML (*i.e.,* a set of logical formulas for which monitors carrying out the necessary runtime analysis exist) that is shown to be maximally expressive (*i.e.,* any property that is monitorable in the logic may be expressed in terms of this syntactic subset). We are unaware of other maximality results of this kind in the context of RV.

In this work we strive towards extending the monitorability limits identified in [2,21,22] for μHML. Particularly, for any logic or specification language, monitorability is a function of the underlying monitoring setup. In [2,21,22], the framework assumes a *classical* monitoring setup, whereby a (single) monitor incrementally analyses an ordered trace of events describing the computation steps that were executed by the system. A key observation made by this paper is that, in general, execution traces need *not* be limited to the reporting of events *that happened*. For instance, they may describe events that *could not have happened* at specific points in the execution of a system. Alternatively, they may also include descriptions for depth-bounded trees of computations that *were possible* at specific points in an execution. We conjecture that there are instances where this additional information can be feasibly encoded in a trace, either dynamically or by way of a pre-processing phase (based, *e.g.*, on the examination of logs of previous system executions, or on the full static checking of sub-components making up the system). More importantly, this additional information could, in principle, permit the verification of more properties at runtime.

The contribution of this paper is a study of how the aforementioned augmented monitoring setups may affect the monitorability of μHML, potentially extending the maximality limits identified in [2,21,22]. More concretely:


The remainder of the paper is structured as follows. After outlining the necessary preliminaries in Sect. 2, we develop our parameterized monitoring framework with conditions in Sect. 3 for a monitoring setup that allows monitors to observe both silent and external actions of systems. The two condition instantiations for this strong setting are presented in Sect. 4. In Sect. 5 we extend the parameterized monitoring framework with conditions to a weak monitoring setup that abstracts from internal moves, followed by two instantiations similar to those presented in Sect. 4. Section 6 concludes by discussing related and future work.

### **2 Background**

**Labelled Transition Systems.** We assume a set of *external* actions Act and a distinguished *silent* action <sup>τ</sup> . We let <sup>α</sup> range over Act and <sup>μ</sup> over Act ∪ {τ}. A *Labelled Transition System* (LTS) on Act is a triple

$$L = \langle P, \mathbf{Actr}, \to\_L \rangle,$$

where P is a nonempty set of system states referred to as *processes* p, q, . . ., and <sup>→</sup><sup>L</sup> <sup>⊆</sup> <sup>P</sup> <sup>×</sup> (Act ∪ {τ}) <sup>×</sup> <sup>P</sup> is a transition relation. We write <sup>p</sup> μ −→<sup>L</sup> <sup>q</sup> instead of (p, μ, q) ∈ →<sup>L</sup>. By <sup>p</sup> μ −→<sup>L</sup> we mean that there is some <sup>q</sup> such that <sup>p</sup> μ −→<sup>L</sup> <sup>q</sup>. We use p μ <sup>=</sup>⇒<sup>L</sup> <sup>q</sup> to mean that, in <sup>L</sup>, <sup>p</sup> can derive <sup>q</sup> using a single <sup>μ</sup> action and any number of silent actions, *i.e.,* <sup>p</sup>( <sup>τ</sup> −→<sup>L</sup>)<sup>∗</sup> <sup>μ</sup> −→<sup>L</sup> ( <sup>τ</sup> −→<sup>L</sup>)∗q. We distinguish between (general) traces <sup>s</sup> <sup>=</sup> <sup>μ</sup>1μ<sup>2</sup> ...μ<sup>r</sup> <sup>∈</sup> (Act ∪ {τ})<sup>∗</sup> and *external traces* <sup>t</sup> <sup>=</sup> <sup>α</sup>1α<sup>2</sup> ...α<sup>r</sup> <sup>∈</sup> Act<sup>∗</sup>. For a general trace <sup>s</sup> <sup>=</sup> <sup>μ</sup>1μ<sup>2</sup> ...μ<sup>r</sup> <sup>∈</sup> (Act ∪ {τ})∗, <sup>p</sup> <sup>s</sup> −→<sup>L</sup> <sup>q</sup> means p μ1 −→<sup>L</sup> μ2 −→<sup>L</sup> ... <sup>μ</sup><sup>r</sup> −→<sup>L</sup> <sup>q</sup>; and for an external trace <sup>t</sup> <sup>=</sup> <sup>α</sup>1α<sup>2</sup> ...α<sup>r</sup> <sup>∈</sup> Act<sup>∗</sup>, p <sup>t</sup> <sup>=</sup>⇒<sup>L</sup> <sup>q</sup> means <sup>p</sup> <sup>α</sup><sup>1</sup> <sup>=</sup>⇒<sup>L</sup> <sup>α</sup><sup>2</sup> <sup>=</sup>⇒<sup>L</sup> ... <sup>α</sup><sup>r</sup> <sup>=</sup>⇒<sup>L</sup> <sup>q</sup> when <sup>r</sup> <sup>≥</sup> 1 and <sup>p</sup>( <sup>τ</sup> −→)∗<sup>q</sup> when <sup>t</sup> <sup>=</sup> <sup>ε</sup> is the empty trace. We occasionally omit the subscript L when it is clear from the context.

*Example 1.* The (standard) regular fragment of CCS [30] with grammar:

$$p, q \in \text{Procc} ::= \text{ nil} \qquad \mid \text{ } \mu.p \qquad \mid \text{ } p+q \qquad \mid \text{ } \text{ rec } x.p \qquad \mid \text{ } x,$$

where x, y, z, . . . are from some countably infinite set of variables Var, and the transition relation defined as:

$$\text{Acr}\frac{\text{Acr}}{\mu.p \xrightarrow{\mu} p} \qquad \text{REC}\frac{p[\text{rec }x.p\langle x\rangle \xrightarrow{\mu} q]}{\text{rec}x.p \xrightarrow{\mu} q} \qquad \text{SELL}\frac{p \xrightarrow{\mu} p'}{p+q \xrightarrow{\mu} p'} \qquad \text{SELR}\frac{q \xrightarrow{\mu} q'}{p+q \xrightarrow{\mu} q'}$$

constitutes the LTS Proc, Act,→. We often use the CCS notation above to describe processes. -

**Specification Logic.** Properties about the behaviour of processes may be specified via the logic μHML [4,28], a reformulation of the modal μ-calculus [26].

**Definition 1.** μHML *formulae on* Act *are defined by the grammar:*

$$\begin{array}{cclclcl} \varphi, \psi \in \mu \text{HML} & ::= & \mathsf{tt} & \mid \quad \mathsf{ff} & \mid \quad \varphi \wedge \psi & \mid \quad \varphi \vee \psi \\ & \mid \quad \langle \mu \rangle \varphi & \mid \quad [\mu] \varphi & \mid \quad \mathsf{min} \ X. \varphi & \mid \quad \mathsf{m} \& \, X. \varphi & \mid \quad X. \end{array}$$

*where* X, Y, Z, . . . *come from a countably infinite set of logical variables* LVar*. For a given LTS* <sup>L</sup> <sup>=</sup> P, Act,→*, an environment* <sup>ρ</sup> *is a function* <sup>ρ</sup> : LVar <sup>→</sup> <sup>2</sup><sup>P</sup> *. Given an environment* <sup>ρ</sup>*,* <sup>X</sup> <sup>∈</sup> LVar*, and* <sup>S</sup> <sup>⊆</sup> <sup>P</sup>*,* <sup>ρ</sup>[<sup>x</sup> <sup>→</sup> <sup>S</sup>] *denotes the environment where* <sup>ρ</sup>[<sup>X</sup> <sup>→</sup> <sup>S</sup>](X) = <sup>S</sup> *and* <sup>ρ</sup>[<sup>X</sup> <sup>→</sup> <sup>S</sup>](<sup>Y</sup> ) = <sup>ρ</sup>(<sup>Y</sup> )*, for all* <sup>Y</sup> <sup>=</sup> <sup>X</sup>*. The semantics of a* μHML *formula* ϕ *over an LTS* L *relative to an environment* ρ*, denoted as* [[ϕ, ρ]]L*, is defined as follows:*

$$\begin{aligned} [\text{tt},\rho]\_L &= P & [\emptyset,\rho]\_L &= \emptyset & [X,\rho]\_L &= \rho(X) \\ [\varphi\_1 \land \varphi\_2,\rho]\_L &= [\varphi\_1,\rho]\_L \cap [\varphi\_2,\rho]\_L & [\varphi\_1 \lor \varphi\_2,\rho]\_L &= [\varphi\_1,\rho]\_L \cup [[\varphi\_2,\rho]\_L] \\ [[\mu]\varphi,\rho]\_L &= \left\{ p \mid \forall q.\ p \xrightarrow{\mu} q \text{ implies } q \in [\varphi,\rho]\_L \right\} \\ [\langle\mu\rangle\varphi,\rho]\_L &= \left\{ p \mid \exists q.\ p \xrightarrow{\mu} q \text{ and } q \in [\varphi,\rho]\_L \right\} \\ [\min X.\varphi,\rho]\_L &= \bigcap \left\{ S \mid S \supseteq [\varphi,\rho[X\mapsto S]] \right\} \\ [\max X.\varphi,\rho]\_L &= \bigcup \left\{ S \mid S \subseteq [\varphi,\rho[X\mapsto S]] \right\} \end{aligned}$$

*Formulas* <sup>ϕ</sup> *and* <sup>ψ</sup> *are equivalent, denoted as* <sup>ϕ</sup> <sup>≡</sup> <sup>ψ</sup>*, when* [[ϕ, ρ]]<sup>L</sup> = [[ψ, ρ]]<sup>L</sup> *for every environment* ρ *and LTS* L*. We often consider closed formulae and simply write* [[ϕ]]<sup>L</sup> *for* [[ϕ, ρ]]<sup>L</sup> *when the semantics of* ϕ *is independent of* ρ*.* -

The logic μHML is very expressive. It is also agnostic of the technique to be employed for verification. The property of monitorability, however, fundamentally relies on the monitoring setup considered.

**Monitoring Systems.** <sup>A</sup> *monitoring setup* on Act is a triple M, I,L, where L is a system LTS on Act, M is a monitor LTS on Act, and I is the instrumentation describing how to compose L and M into an LTS, denoted by I(M,L), on Act. We call the pair (M, I) a *monitoring system* on Act. For <sup>M</sup> <sup>=</sup> Mon, Act,→<sup>M</sup>, Mon is set of monitor states (ranged over by <sup>m</sup>) and →<sup>M</sup> is the *monitor semantics* described in terms of the behavioural state transitions a monitor takes when it analyses trace events <sup>μ</sup> <sup>∈</sup> Act ∪ {τ}. The states of the composite LTS I(M,L) are written as mp, where m is a monitor state and p is a system state; the monitored-system transition relation is denoted here by →<sup>I</sup>(M,L). We present our results with a focus on *rejection* monitors, *i.e.,* monitors with a designated rejection state no, and hence safety fragments of the logic μHML. However, our results and arguments apply dually to acceptance monitors (with a designated acceptance state yes) and co-safety properties; see [21,22] for details.

**Definition 2.** *Fix a monitoring setup* M, I,L *on* Act *and let* <sup>m</sup> *be a monitor state of* M *and* ϕ *a closed formula of* μ*HML on* Act*. We say that* m (M, I)*-*rejects *(or simply* rejects*, if* M, I *are evident) a process* p *in* L*, written as* **rej**M,I,L(m, p)*, when there are a process* <sup>q</sup> *in* <sup>L</sup> *and a trace* <sup>s</sup> <sup>∈</sup> (Act∪{τ})<sup>∗</sup> *such that* mp <sup>s</sup> −→I(M,L) no q*. We say that* <sup>m</sup> (M, I)*-*monitors for <sup>ϕ</sup> *on* <sup>L</sup> *whenever*

for each process p of L, **rej**M,I,L(m, p) if and only if p /<sup>∈</sup> [[ϕ]]L.

*(Subscripts are omitted when they are clear from the context.) Finally,* m (M, I) monitors for ϕ *when* m (M, I)*-monitors for* ϕ *on* L *for every LTS* L *on* Act*. The monitoring system* (M, I) *is often omitted when evident.* -

We define monitorability for μHML in terms of monitoring systems (M, I).

**Definition 3.** *Fix a monitoring system* (M, I) *and a fragment* Λ *of* μ*HML. We say that* (M, I) *rejection-monitors for* Λ *whenever:*


We note that if a monitoring system and a fragment Λ of μHML satisfy the conditions of Definition 3, then Λ is the largest fragment of μHML that is monitored by the monitoring system. Stated otherwise, any other logic fragment Λ that satisfies the conditions of Definition 3 must be equally expressive to <sup>Λ</sup>, *i.e.,* <sup>∀</sup>ϕ <sup>∈</sup> <sup>Λ</sup> · ∃<sup>ϕ</sup> <sup>∈</sup> <sup>Λ</sup> · <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup> and vice versa. Definition <sup>3</sup> can be dually given for acceptance-monitorability, when considering acceptance monitors. We next review two monitoring systems that respectively rejection-monitor for two different fragments of μHML. We omit the corresponding monitoring systems for acceptance-monitors, that monitor for the dual fragments of μHML.

**The Basic Monitoring Setup.** The following monitoring system, presented in [2], does *not* distinguish between silent actions and external actions.

**Definition 4.** *A* basic *monitor on* Act *is defined by the grammar:*

m, n <sup>∈</sup> Mon<sup>b</sup> ::= end <sup>|</sup> no <sup>|</sup> μ.m <sup>|</sup> <sup>m</sup> <sup>+</sup> <sup>n</sup> <sup>|</sup> rec x.m <sup>|</sup> x,

*where* x *comes from a countably infinite set of monitor variables. Constant* no *denotes the* rejection verdict *state whereas* end *denotes the* inconclusive verdict *state. The basic monitor LTS* M<sup>b</sup> *is the one whose states are the closed monitors of* Mon<sup>b</sup> *and whose transition relation is defined by the (standard) rules in Table 1 (we elide the symmetric rule for* m + n*).* -

Note that by rule mVrd in Table 1, verdicts are irrevocable and monitors can only describe suffix-closed behaviour.


**Table 1.** Behaviour and instrumentation rules for monitored systems (*v*∈{end*,* no}).

**Definition 5.** *Given a system LTS* L *and a monitor LTS* M *that agree on* Act*, the* basic *instrumentation LTS, denoted by* Ib(M,L)*, is defined by the rules* iMon *and* iTer *in Table 1. (We do* not *consider rule* iAbs *for now.)* -

Instrumentation often relegates monitors to a passive role, whereby a monitored system transitions only when the system itself can. In rule iMon, when the system produces a trace event μ that the monitor is able to analyse (and transition from m to n), the constituent components of a monitored system mp move in lockstep. Conversely, when the system produces an event μ that the monitor is *unable* to analyse, the monitored system still executes, according to iTer, but the monitor transitions to the inconclusive state, where it remains for the rest of the computation.

We refer to the pair (Mb, Ib) from Definitions 4 and 5 as the *basic monitoring system*. For each system LTS L that agrees with the full monitoring system on Act, we can show a correspondence between the respective monitoring setup Mb, Ib, L and the following syntactic subset of <sup>μ</sup>HML.

**Definition 6.** *The* safety μHML *is defined by the grammar:*

θ, χ <sup>∈</sup> sHML ::= tt <sup>|</sup> ff <sup>|</sup> [μ]<sup>θ</sup> <sup>|</sup> <sup>θ</sup> <sup>∧</sup> <sup>χ</sup> <sup>|</sup> max X.θ <sup>|</sup> <sup>X</sup> -

**Theorem 1 (**[2]**).** *The* basic *monitoring system* (Mb, Ib) *monitors for the logical fragment* sHML*.*

The proof of Theorem 1 relies on a monitor synthesis and a formula synthesis function. The monitor synthesis function, -− : sHML → Monb, is defined on the structure of the input formula and assumes a bijective mapping between formula variables and monitor recursion variables:

$$\begin{aligned} \left\{ \mathsf{tf} \right\} &= \mathsf{end} \qquad \qquad \qquad \left\{ \mathsf{ff} \right\} = \mathsf{no} \qquad \qquad \qquad \left\{ X \right\} = x \\ \left\{ \left[ \mu \right] \psi \right\} &= \begin{cases} \mathsf{end} \qquad \text{if } \left\{ \psi \right\} = \mathsf{end} \\ \mu. \left\{ \psi \right\} &\text{otherwise} \end{cases} \qquad \left\{ \mathsf{max} \,\, X. \psi \right\} = \begin{cases} \mathsf{end} \qquad \text{if } \left\{ \psi \right\} = \mathsf{end} \\ \mathsf{rec} \,\, x. \{\psi\} &\text{otherwise} \end{cases} \end{aligned}$$

$$\left\{ \psi\_1 \wedge \psi\_2 \right\} = \begin{cases} \left\{ \psi\_1 \right\} & \text{if } \left\{ \psi\_2 \right\} = \mathsf{end} \\ \left\{ \psi\_2 \right\} & \text{if } \left\{ \psi\_1 \right\} = \mathsf{end} \\ \left\{ \psi\_1 \right\} + \left\{ \psi\_2 \right\} & \text{otherwise} \end{cases}$$

The case analyses in the above synthesis procedure handle some of the redundancies that may be present in formula specifications. For instance, it turns out that max X.[μ]tt <sup>≡</sup> tt and, accordingly, max X.[μ]tt <sup>=</sup> tt <sup>=</sup> end. The formula synthesis function is defined analogously (see [2,22] for more details).

**Monitoring for External Actions.** The results obtained in [21,22] can be expressed and recovered within our more general framework. We can express a weak version of the modalities employed in [3,21,22] as follows:

$$\begin{aligned} & ([\mu])\varphi \equiv \mathsf{max} \ X. ([\tau]X \land [\mu]\mathsf{max} \ Y. (\varphi \land [\tau]Y)) \text{ and} \\ & (\langle \mu \rangle)\varphi \equiv \mathsf{min} \ X. (\langle \tau \rangle X \lor \langle \mu \rangle \mathsf{min} \ Y. (\varphi \lor \langle \tau \rangle Y)). \end{aligned}$$

**Definition 7.** Weak safety μHML*, presented in [21,22], is defined by the grammar:*

π, κ <sup>∈</sup> WsHML ::= tt <sup>|</sup> ff <sup>|</sup> [[α]]<sup>π</sup> <sup>|</sup> <sup>π</sup> <sup>∧</sup> <sup>κ</sup> <sup>|</sup> max X.π <sup>|</sup> X. -

**Definition 8.** *The set* Mon<sup>e</sup> *of external monitors on* Act *contains all the basic monitors that do not use the silent action* τ *. The corresponding external monitor LTS* Me*, is defined similarly to* Mb*, but with the closed monitors in* Mon<sup>e</sup> *as its states.* External instrumentation*, denoted by* Ie*, is defined by the* three *rules* iMon*,* iTer *and* iAbs *in Table 1, where in the case of* iMon *and* iTer*, action* μ *is substituted by the external action* α*. We refer to the pair* (Me, Ie) *as the* external monitoring system*, amounting to the setup in [21,22].* -

**Theorem 2 (**[22]**).** *The* external *monitoring system* (Me, Ie) *rejection-monitors for the logical fragment* WsHML*.*

### **3 Monitors that Detect Conditions**

Given a set of processes P, a pair (C, r) is a condition framework when C is a non-empty set of *conditions* and <sup>r</sup> : <sup>C</sup> <sup>→</sup> <sup>2</sup><sup>P</sup> is a valuation function. We assume a fixed condition framework (C, r) and we extend the syntax and semantics of <sup>μ</sup>HML so that for every condition <sup>c</sup> <sup>∈</sup> <sup>C</sup>, both <sup>c</sup> and <sup>¬</sup><sup>c</sup> are formulas and for every LTS <sup>L</sup> on set of processes <sup>P</sup>, [[c]] = <sup>r</sup>(c) and [[¬c]] = <sup>P</sup> \ <sup>r</sup>(c). We call the extended logic μHML(C,r) . Since, in all the instances we consider, r is easily inferred from C, it is often omitted and we simply write C instead of (C, r) and <sup>μ</sup>HML(C,r) as <sup>μ</sup>HML<sup>C</sup> . We say that process <sup>p</sup> satisfies <sup>c</sup> when <sup>p</sup> <sup>∈</sup> [[c]]. We assume that <sup>C</sup> is closed under negation, meaning that for every <sup>c</sup> <sup>∈</sup> <sup>C</sup>, there is some <sup>c</sup> <sup>∈</sup> <sup>C</sup>, such that [[c ]] = [[¬c]]. Conditions represent certain properties of processes that the instrumentation is able to report.

We extend the syntax of monitors, so that if m is a monitor and c a condition, then c.m is a monitor. The idea is that if c.m detects that the process satisfies c, then it can transition to m.

**Definition 9.** *A* basic C*-*monitor *on* Act *is defined by the grammar:*

m, n <sup>∈</sup> Mon<sup>C</sup> <sup>b</sup> ::= end <sup>|</sup> no <sup>|</sup> μ.m <sup>|</sup> c.m <sup>|</sup> <sup>m</sup> <sup>+</sup> <sup>n</sup> <sup>|</sup> rec x.m <sup>|</sup> x,

*where* <sup>x</sup> *comes from a countably infinite set of monitor variables and* <sup>c</sup> <sup>∈</sup> <sup>C</sup>*. Basic* C*-monitor behaviour is defined as in Table 1, but allowing* μ *to range over* Act <sup>∪</sup> <sup>C</sup> ∪ {τ}*. We call the resulting monitor LTS* <sup>M</sup><sup>C</sup> <sup>b</sup> *.* -

A monitor detects the satisfaction of condition c when the monitored system has transitioned to a process that satisfies c. To express this intuition, we add rule iCon to the instrumentation rules of Table 1:

$$\text{nCON } \begin{array}{c} p \in \left[c\right] \quad \text{and} \quad m \xrightarrow{c}\_{M} n \\ m \lhd p \xrightarrow{\tau}\_{I(M,L)} n \lhd p \end{array} .$$

We call the resulting instrumentation I<sup>C</sup> <sup>b</sup> . We observe that the resulting monitor setup is transparent with respect to external actions: an external trace of the monitored system results in exactly the same external trace of the instrumentation LTS. However, the general traces are not preserved, as the rule iCon may introduce additional silent transitions for the instrumentation trace. However, we argue that this is an expected consequence of the instrumentation verifying the conditions of C. C-monitors monitor for sHML<sup>C</sup> :

**Definition 10.** *The strong safety fragment of* μHML<sup>C</sup> *is defined as:*

ϕ, ψ <sup>∈</sup> sHML<sup>C</sup> ::= tt <sup>|</sup> ff <sup>|</sup> [μ]<sup>ϕ</sup> | ¬c∨<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup>∧<sup>ψ</sup> <sup>|</sup> max X.ϕ <sup>|</sup> X,

*where* <sup>c</sup> <sup>∈</sup> <sup>C</sup>*. We note that* <sup>¬</sup><sup>c</sup> <sup>∨</sup> <sup>ϕ</sup> *can be viewed as an implication* <sup>c</sup> <sup>→</sup> <sup>ϕ</sup> *asserting that if* c *holds, then* ϕ *must also hold.* -

It is immediate to see that sHML<sup>C</sup> is a fragment of <sup>μ</sup>HML<sup>C</sup> and when <sup>C</sup> <sup>⊆</sup> μHML, it is also a fragment of μHML. Finally, if C is closed under negation, then <sup>¬</sup><sup>c</sup> <sup>∨</sup> <sup>ϕ</sup> can be rewritten as <sup>c</sup> <sup>∨</sup> <sup>ϕ</sup>, where [[c ]] = [[¬c]], and in the following we often take advantage of this equivalence to simplify the syntax of sHML<sup>C</sup> .

**Theorem 3.** *The monitoring system* (M<sup>C</sup> <sup>b</sup> , I<sup>C</sup> <sup>b</sup> ) *monitors for* sHML<sup>C</sup> *.*

We note that Theorem 3 implies that sHML<sup>C</sup> is the largest monitorable fragment of μHML<sup>C</sup> , relative to C.

### **4 Instantiations**

We consider two possible instantiations for parameter C in the framework presented in Sect. 3. Since each of these instantiations consists of a fragment from the logic μHML itself, they both show how monitorability for μHML can be extended when using certain augmented traces.

#### **4.1 The Inability to Perform an Action**

The monitoring framework of [2,22] (used also in other works such as [18,19]), is based on the idea that, while a system is executing, it performs discrete computational steps called events (actions) that are recorded and relayed to the monitor for analysis. Based on the analysed events, the monitor then transitions from state to state. One may however also consider instrumentations that record a system's *inability* to perform a certain action. Examples of this arise naturally in situations where actions are requested unsuccessfully by an external entity on a system, or whenever the instrumentation is able to report system stability (*i.e.,* the inability of performing internal actions). For instance, such observations were considered in [1,31], in the context of testing preorders.

In our setting, a process is unable to perform action μ exactly when it satisfies [μ]ff. For monitors that are able to detect the inability or failure of a process to perform actions, we set <sup>F</sup>Act <sup>=</sup> {[μ]ff <sup>|</sup> <sup>μ</sup> <sup>∈</sup> Act ∪ {τ}} as the set of conditions. By Theorem 3, the resulting maximal monitorable fragment of μHML is given by the grammar:

$$\varphi, \psi \in \text{sHML}^{F\_{\text{Arc}}} ::= \begin{array}{c c c c} \text{t} & \mid \text{ ff} & \mid \ \lbrack \mu \rbrack \varphi & \mid \ \langle \mu \rangle \text{tt} \lor \varphi \\ \mid \ \varphi \land \psi & \mid \text{max} \ X. \varphi & \mid \ X. \end{array}$$

We note the fact that <sup>μ</sup>HML is closed under negation, where <sup>¬</sup>[μ]ff <sup>=</sup> μtt.

**Proposition 1.** *The monitoring system* (M<sup>F</sup>Act <sup>b</sup> , I<sup>F</sup>Act <sup>b</sup> ) *monitors for the logical fragment* sHML<sup>F</sup>Act *.*

A special case of interest are monitors that can detect process stability, *i.e.,* processes satisfying [<sup>τ</sup> ]ff. Such monitors monitor for sHML{[τ]ff}, namely sHML from Definition <sup>6</sup> extended with formulas of the form <sup>τ</sup> tt <sup>∨</sup> <sup>ϕ</sup>.

#### **4.2 Depth-Bounded Static Analysis**

In multi-pronged approaches using a combination of verification techniques, one could statically verify parts of a program (from specific execution points) with respect to certain behavioural properties using techniques such as Bounded Model Checking [11] and Partial Model Checking [7]. Typical examples arise in component-based software using modules, objects or agents that can be verified in isolation. This pre-computed verification can then be recorded as annotations to a component and subsequently reported by the instrumentation as part of the execution trace. This strategy would certainly be feasible for depth-bounded static analysis for which the original logic HML [24]—the recursion-free fragment of μHML given below—is an ideal fit.

$$\forall \eta, \chi \in \text{HML} ::= \begin{array}{c} \text{tt} \\ \end{array} \mid \begin{array}{c} \text{ff} \\ \end{array} \mid \begin{array}{c} \text{\(\eta \land \chi\)} \\ \end{array} \mid \begin{array}{c} \text{\(\eta \lor \chi\)} \\ \end{array} \mid \begin{array}{c} \text{\(\mu\)} \\ \end{array} \mid \begin{array}{c} \text{\(\mu\)} \\ \end{array} \mid \begin{array}{c} \text{\(\mu\)} \\ \end{array} \mid$$

Again, HML is closed under negation [4]. If we allow monitors to detect the satisfaction of these kinds of conditions, then, according to Theorem 3, the maximal fragment of μHML that we can monitor for, with HML as a condition framework, is sHMLHML, defined by the following grammar:

ϕ, ψ ::= tt <sup>|</sup> ff <sup>|</sup> [μ]<sup>ϕ</sup> <sup>|</sup> <sup>η</sup> <sup>∨</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup> <sup>|</sup> max X.ϕ <sup>|</sup> X,

where <sup>η</sup> <sup>∈</sup> HML. Another way to describe sHMLHML is as the <sup>μ</sup>HML fragment that includes all formulas whereby for every subformula of the form <sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>, at most *one* of the constituent subformulas ϕ, ψ uses recursion.

**Proposition 2.** *The monitoring system* (MHML <sup>b</sup> , IHML <sup>b</sup> ) *monitors for the logical fragment* sHMLHML*.*

Instead of HML, we can alternatively use a fragment HML<sup>d</sup> of HML that only allows formulas with nesting depth for the modalities of at most d. Since the complexity of checking HML formulas is directly dependent on this modal depth, there are cases where the overheads of checking such formulas are deemed to be low enough to be adequately checked for at runtime instead of checking for them statically.

### **5 Extending External Monitorability**

We explore the impact of considering traces that encode conditions from Sect. 3 on the monitorability of the weak version of the logic used in [21,22]:

$$\begin{array}{cclclcl}\varphi, \psi \in \mathsf{W}\mu\text{HML} ::= \begin{array}{ccl} \mathsf{tt} & | & \mathsf{ff} \\ | & \langle \langle \alpha \rangle \rangle \varphi & | & [\mathsf{lin}] \varphi \end{array} & \begin{array}{c} | \ \varphi \vee \psi \end{array} \\ \end{array}$$

This version of the logic abstracts away from internal moves performed by the system—note that the weak modality formulas are restricted to external actions α as opposed to the general ones, μ. The semantics follows that presented in Sect. 2, but can alternatively be given a more direct inductive definition, *e.g.*

$$\{ [[\alpha]]\varphi, \rho \} = \{ p \mid \forall q. \; p \not\Rightarrow q \text{ implies } q \in [\varphi, \rho] \}.$$

The main aim of this section is to extend the maximally-expressive monitorable subset of μHML that was identified in [21,22] using the framework developed in Sect. 3.

### **5.1 External Monitoring with Conditions**

We define the external monitoring system with conditions similarly to Sect. 3. The syntax of Definition 8 is extended so that, for any instance of C, if m is a monitor and c a condition from C, then c.m is a monitor.

**Definition 11.** *An* external C*-*monitor *on* Act *is defined by the grammar:*

m, n <sup>∈</sup> Mon<sup>C</sup> <sup>e</sup> ::= end <sup>|</sup> no <sup>|</sup> α.m <sup>|</sup> c.m <sup>|</sup> <sup>m</sup> <sup>+</sup> <sup>n</sup> <sup>|</sup> rec x.m <sup>|</sup> x,

*where* <sup>c</sup> <sup>∈</sup> <sup>C</sup>*.* <sup>C</sup>*-monitor behaviour is defined as in Table 1, but extending rule* mAct *to condition prefixes that generate condition actions (i.e.,* μ *ranges over* Act <sup>∪</sup> <sup>C</sup>*). We call the resulting monitor LTS* <sup>M</sup><sup>C</sup> <sup>e</sup> *.*

*For the instrumentation relation called* I<sup>C</sup> <sup>e</sup> *, we consider the rules* iMon*,* iTer *from Table 1 for external actions* α *instead of the general action* μ*, rule* iAbs *from the same table, and rule* iCon *from Sect. 3.* -

Note that the monitoring system (M<sup>C</sup> <sup>e</sup> , I<sup>C</sup> <sup>e</sup> ) may be used to detect τ transitions *implicitly*—we conjecture that this cannot be avoided in general. Consider two conflicting conditions <sup>c</sup><sup>1</sup> and <sup>c</sup>2, *i.e.,* [[c1]]∩[[c2]]=∅. Definition <sup>11</sup> permits monitors of the form c1.c2.m that encode the fact that state m can only be reached when the system under scrutiny performs a non-empty sequence of τ -moves to transition from a state satisfying c<sup>1</sup> to another state satisfying c2. This, in some sense, is also related to obscure silent action monitoring studied in [2].

We identify the grammar for the maximally-expressive monitorable syntactic subset of the logic WμHML. It uses the formula [[ε]]ϕ defined as:

$$([\varepsilon]]\varphi \equiv \mathfrak{max} \ X. (\varphi \wedge [\tau]X).$$

The modality [[ε]]ϕ quantifies universally over the set of processes that can be reached from a given one via any number of silent steps. Together with its dual ε<sup>ϕ</sup> modality, [[ε]]<sup>ϕ</sup> is used in the modal characterisation of weak bisimilarity [30,34], in which τ transitions from one process may be matched by a (possibly empty) sequence of τ transitions from another.

**Definition 12.** *The weak safety fragment of* WμHML *with* C *is defined as:*

$$\begin{array}{ccccc} \varphi, \psi \in \text{WsHML}^{C} ::= \text{tt} & \mid \text{f} & \mid \\ & \mid \ \varphi \land \psi & \mid \text{max} \, X. \varphi & \mid \, X, \end{array}$$

*where* <sup>c</sup> <sup>∈</sup> <sup>C</sup>*.* -

**Theorem 4.** *The monitoring system* (M<sup>C</sup> <sup>e</sup> , I<sup>C</sup> <sup>e</sup> ) *monitors for* WsHML<sup>C</sup> *.*

We highlight the need to insulate the appearance of the implication <sup>¬</sup><sup>c</sup> <sup>∨</sup> <sup>ϕ</sup> from internal system behaviour by using the modality [[ε]] in Definition 12. For conditions that are invariant under τ -transitions, this modality is not required but it cannot be eliminated otherwise; we revisit this point in Example 2.

#### **5.2 Instantiating External Monitors with Conditions**

We consider three different instantiations to our parametric external monitoring system of Sect. 5.1.

**Recursion-Free Formulas.** The weak version of HML, denoted by wHML, is the recursion-free fragment of WμHML. Similarly to what was argued earlier in Sect. 4.2, it is an appropriate set of conditions to instantiate set C in WsHML<sup>C</sup> , and the maximal monitorable fragment of WμHML with conditions from wHML is WsHMLwHML, defined by the following grammar, where <sup>η</sup> <sup>∈</sup> <sup>w</sup>HML:

ϕ, ψ ::= tt <sup>|</sup> ff <sup>|</sup> [[α]]<sup>ϕ</sup> <sup>|</sup> [[ε]](<sup>η</sup> <sup>∨</sup> <sup>ϕ</sup>) <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup> <sup>|</sup> max X.ϕ <sup>|</sup> X.

**Proposition 3.** *The monitoring system* (M<sup>w</sup>HML <sup>e</sup> , I<sup>w</sup>HML <sup>e</sup> ) *monitors for the logical fragment* WsHML<sup>w</sup>HML*.*

An important observation (that is perhaps surprising) is that WsHML<sup>w</sup>HML is *not* a fragment of WμHML, as the following example demonstrates.

*Example 2.* Although for any (closed) WsHML formula ϕ we have the logical equivalence [[ε]]<sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup> (notice that the monitor for <sup>ϕ</sup> that is guaranteed by Theorem 2 also monitors for [[ε]]ϕ), this logical equivalence does not hold for a formula ϕ from WμHML. Consider the formula ϕ below that may be expressed using a formula from WsHML<sup>w</sup>HML:

$$\varphi\_{\epsilon} = [[\varepsilon]] \langle \langle \alpha \rangle \rangle \mathbf{tt} \equiv [[\varepsilon]] (\langle \langle \alpha \rangle \rangle \mathbf{tt} \vee \mathbf{ff}) \in \text{WsHML}^{w\text{HML}}.$$

Formula <sup>ϕ</sup> is not equivalent to αtt (*e.g.* the process α.nil <sup>+</sup> τ.nil satisfies αtt, but not <sup>ϕ</sup>) meaning that [[ε]] plays a discerning role in the context of <sup>W</sup>μHML. Furthermore, <sup>ϕ</sup> holds for process τ.α.nil, but not for α.nil+τ.nil, even though these two processes cannot be distinguished by *any* WμHML formula. In fact, it turns out that they are bisimilar with respect to *weak external transitions* and this bisimulation characterises the satisfaction of WμHML formulas [24]. Thus, there is no formula in WμHML that is equivalent to ϕ. -

**Previous Runs and Alternating Monitoring.** A monitoring system could reuse information from previous system runs, perhaps recorded as execution logs, and whenever (sub)traces can be associated with specific states of the system, these can also be used as an instantiation for our parametric framework. More concretely, in [21,22] it is shown that traces can be used to characterise the violation of WsHML formulas, or the satisfaction of formulas from the dual fragment, WcHML, defined below.

**Definition 13.** *The* co-safety WμHML *is defined by the grammar:*

$$\forall \pi, \kappa \in \text{WcHML} ::= \begin{array}{c} \text{tt} \\ \end{array} \mid \begin{array}{c} \text{ff} \\ \end{array} \mid \begin{array}{c} \langle \langle \alpha \rangle \rangle \theta \\ \end{array} \mid \begin{array}{c} \theta \vee \chi \\ \end{array} \mid \begin{array}{c} \text{min X.} \theta \\ \end{array} \mid X \begin{array}{c} \\ \end{array}$$

The witnessed rejection and acceptance traces can in turn be used as part of an augmented trace for an instantiation for C to obtain the monitorable dual logics WsHMLWcHML and WcHMLWsHML that alternate between rejection monitoring and acceptance monitoring. The logic WsHMLWcHML is defined by the following grammar, where <sup>θ</sup> <sup>∈</sup> WsHML:

$$\varphi, \psi ::= \begin{array}{c|c} \mathsf{tt} & | \ \mathsf{ff} & | \ \end{array} \begin{array}{c|c} [ \ \vert \alpha \end{array} \begin{array}{c} \ \vert \ \vert \varepsilon \end{array} \begin{array}{c} \ \vert \varepsilon \end{array} \begin{array}{c} \ \vert \varepsilon \end{array} \begin{array}{c} \ \vert \varphi \land \psi \end{array} \begin{array}{c} \ \vert \ \varphi \land \psi \end{array} \begin{array}{c} \ \vert \ \mathsf{max} \ X. \varphi \quad \vert \ \vert X; \end{array} $$

and WcHMLWsHML is defined by the following grammar, where <sup>χ</sup> <sup>∈</sup> WcHML:

π, κ ::= tt <sup>|</sup> ff | α<sup>π</sup> | ε(<sup>χ</sup> <sup>∧</sup> <sup>π</sup>) <sup>|</sup> <sup>π</sup> <sup>∨</sup> <sup>κ</sup> <sup>|</sup> min X.ϕ <sup>|</sup> X.

**Proposition 4.** *The monitoring system* (MWcHML <sup>e</sup> , IWcHML <sup>e</sup> ) *rejection-monitors for the logical fragment* WsHMLWcHML*.*

One should observe that in this case, WsHMLWcHML *is* a fragment of WμHML, in contrast to the previous instantiation WsHML<sup>w</sup>HML from Sect. 5.2.

**Lemma 1.** *For every* [[ε]](<sup>η</sup> <sup>∨</sup> <sup>ϕ</sup>) <sup>∈</sup> WsHMLWcHML *(where* <sup>η</sup> <sup>∈</sup> WsHML*), we have* [[ε]](<sup>η</sup> <sup>∨</sup> <sup>ϕ</sup>) <sup>≡</sup> <sup>η</sup> <sup>∨</sup> <sup>ϕ</sup>*.*

**Corollary 1.** *For every formula in* WsHMLWcHML*, there is a logically equivalent formula in* <sup>W</sup>μHML*.*

This entails that WsHMLWcHML can be reformulated using the following, simpler, grammar (here <sup>η</sup> <sup>∈</sup> WsHML) which is clearly a fragment of WμHML:

ϕ, ψ ::= tt <sup>|</sup> ff <sup>|</sup> [[α]]<sup>ϕ</sup> <sup>|</sup> <sup>η</sup> <sup>∨</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup> <sup>|</sup> max X.ϕ <sup>|</sup> X.

If the monitoring system can use such information from previous runs, there is no reason to limit this information to just one previous run. If the instrumentation mechanism can record up to i prior runs, the monitorable logic may be described as WsHML<sup>i</sup>+1, defined inductively in the following way:

– WsHML<sup>1</sup> = WsHML and WcHML<sup>1</sup> = WcHML; and – WsHML<sup>i</sup>+1 = WsHMLWcHML<sup>i</sup> and WcHML<sup>i</sup>+1 = WcHMLWsHML<sup>i</sup> .

Whenever this setup can be extended to unlimited prior runs, the resulting rejection-monitorable fragment would be WsHML<sup>ω</sup> = <sup>i</sup> WsHML<sup>i</sup> , which is also described by the following grammar:

ϕ, ψ ::= tt <sup>|</sup> ff <sup>|</sup> [[α]]<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup> <sup>|</sup> max X.ϕ <sup>|</sup> X.

WsHML<sup>ω</sup> is a non-trivial extension of WsHML *which is still within* WμHML.

**Failure to Execute an Action and Refusals.** In Subsect. 4.1, we instantiated the condition set C as the set of formulas from μHML that assert the inability of a process to perform an action. These formulas are of the form [α]ff. We recast this approach in the setting of weak monitorability. In this setting where the monitoring system and the specification formulas ignore any silent transitions, the inability of a process to perform an α-transition acquires a different meaning from the one used for the basic system. In particular, we consider a stronger version of these conditions that incorporates stability; this makes them invariant over <sup>τ</sup> -transitions. We say that <sup>p</sup> *refuses* <sup>α</sup> when <sup>p</sup> τ −→ and <sup>p</sup> α −→. In [31], a very similar notion is used for refusal testing (see also [1]). Thus, much in line with [31], we use the following definition.

**Definition 14.** *A process* <sup>p</sup> *of an LTS* <sup>L</sup> refuses *action* <sup>α</sup> <sup>∈</sup> Act *and write* <sup>p</sup> *ref* <sup>α</sup> *when* <sup>p</sup> τ −→<sup>L</sup> *and* <sup>p</sup> α −→L*. The set of conditions that corresponds to refusals is thus* <sup>R</sup>Act <sup>=</sup> {[<sup>τ</sup> ]ff <sup>∧</sup> [α]ff <sup>|</sup> <sup>α</sup> <sup>∈</sup> Act}. -

According to Theorem 4, the largest fragment of μHML that we can monitor for, using monitors that can detect refusals, is WsHML<sup>R</sup>Act , given by the following grammar:

$$\begin{array}{cclcl}\varphi,\psi ::= \mathtt{tt} & \mid \mathtt{f} & \mid \ [ [\alpha]]\varphi & \mid \ [ [\varepsilon]](\langle \tau \rangle \mathtt{tt} \vee \langle \alpha \rangle \mathtt{tt} \vee \varphi) \\ & \mid \varphi \wedge \psi & \mid \mathtt{max}\ X.\varphi & \mid \ X. \end{array}$$

Again, <sup>τ</sup> tt ∨ αtt <sup>∨</sup> <sup>ϕ</sup> is best read as the implication ([<sup>τ</sup> ]ff <sup>∧</sup> [α]ff) <sup>→</sup> <sup>ϕ</sup>: if the process is stable and cannot perform an α-transition, then ϕ must hold.

**Proposition 5.** *The monitoring system* (M<sup>R</sup>Act <sup>e</sup> , I<sup>R</sup>Act <sup>e</sup> ) *monitors for the logical fragment* WsHML<sup>R</sup>Act *.*

*Example 3.* Consider the formula

$$
\varphi\_s = [[\varepsilon]](\langle \tau \rangle \mathtt{tt} \vee \langle \alpha \rangle \mathtt{tt} \vee [[\beta]] \mathtt{ff}) \in \mathtt{Ws}\mathrm{HML}^{R\_{\mathrm{Act}}}.
$$

Formula ϕ<sup>s</sup> claims that at every stable state that the system can reach, if action α is impossible, then action β should also be impossible. We can see that ϕ<sup>s</sup> is true for τ.nil <sup>+</sup> β.nil, but not for β.nil. However, the two processes cannot be distinguished by WμHML, as they have the same weak external transitions. Therefore, WsHML<sup>R</sup>Act is not a fragment of WμHML—but, as we have seen, it is a fragment of μHML. Here we have a part of the formula that clearly is not part of WμHML. That is <sup>τ</sup> tt, which asserts that the process can perform a silent transition. -

*Example 4.* Let us consider an LTS L<sup>0</sup> of stable processes—that is, L<sup>0</sup> is an LTS without any silent transitions. L<sup>0</sup> offers a simplified setting to cast our observations. In this case, the [[ε]], [<sup>τ</sup> ], and <sup>τ</sup> modalities can be eliminated from our formulas, and weak modalities are equivalent to strong modalities. This allows us to simplify the grammar for WsHML<sup>F</sup>Act as follows:

$$\begin{array}{ccccc} \varphi, \psi ::= \mathtt{tt} & \mid \mathtt{ff} & \mid \, [\alpha] \varphi & \mid \, \langle \alpha \rangle \mathtt{tt} \lor \varphi \\ & \mid \, \varphi \land \psi & \mid \, \mathtt{max} \, X. \varphi & \mid \, X. \end{array}$$

Perhaps unsurprisingly, this grammar yields the same formulas as the restriction of grammar of Subsect. 4.1 on external actions. An instance of a specification that can be formalized in this fragment is the following. Consider a simple server-client system, where the client can request a resource, which is represented by action rq, and the server may give a positive response, represented by action rs, after which it needs to allocate said resource to the client, represented by action al. A reasonable specification for the server is that if it is impossible at the moment to provide a resource, then it should not give a positive response to the client. In the above simplification of WsHMLFAct , this specification can be formalized as [rq](altt <sup>∨</sup> [rs]ff). If the LTS includes silent transitions, the corresponding specification would be written as

$$
\varphi\_r = [\mathbf{r}\mathbf{q}][[\varepsilon]](\langle \tau \rangle \mathbf{tt} \vee \langle \mathbf{a1} \rangle \mathbf{tt} \vee [[\mathbf{r}\mathbf{s}]] \mathbf{f} \mathbf{f}) .
$$

In other words, after a request, if the server cannot provide a resource and it is stable—so, there is no possibility that after some time the resource will be available—then the server should not give a positive response to the client. -

### **6 Conclusions**

In order to devise effective verification strategies that straddle between the preand post-deployment phases of software production, one needs to understand better the monitorability aspects of the correctness properties that are to be verified. We have presented a general framework that allows us to determine maximal monitorable fragments of an expressive logic that is agnostic of the verification technique employed, namely μHML. By way of a number of instantiations, we also show how the framework can be used to reason about the monitorability induced by various forms of augmented traces. Our next immediate concern is to validate the proposed instantiations empirically by constructing monitoring systems and tools that are based on these results, as we did already for the original monitorability results of [21,22] in [9,10,12].

*Related Work.* Monitorability for μHML was first examined in [21,22]. This work introduced the external monitoring system and identified WsHML as the largest monitorable fragment of μHML, with respect to that system. The ensuring work in [2] focused on monitoring setups that can distinguish silent actions to a varying degree, and introduced the basic monitoring system, showing analogous monitorability results for μHML.

Monitorability has also been examined for languages defined over traces, such as LTL. Pnueli and Zaks in [32] define a notion of monitorability over traces, although they do not attempt maximal monitorability results. Diekert and Leuckert revisited monitorability from a topological perspective in [16]. Falcone *et al.* in [17] extended the work in [32] to incorporate enforcement and introduced a notion of monitorability on traces that is parameterized with respect to a truth domain that corresponds to our separation to acceptanceand rejection-monitorable properties. In [13], the authors use a monitoring system that can generate derivations of satisfied formulas from a fragment of LTL. However, they do not argue that this fragment is somehow maximal. There is a significant body of work on synthesizing monitors from LTL formulas, *e.g.* [13,23,33,35], and it would be worth investigating whether our general techniques for monitor synthesis can be applied effectively in these cases.

Phillips introduced *refusal testing* in [31] as a way to extend the capabilities of testing (see [18] for a discussion on how our monitoring setup relates to testing preorders). The meaning of refusals in [31] is very close to the one in Definition 14 and it is interesting to note how Phillips' use of tests for refusal formulas is similar to our monitoring mechanisms for refusals. Abramsky [1] uses refusals in the context of a much more powerful testing machinery, in order to identify the kind of testing power that is required for distinguishing non-bisimilar processes.

The decomposition of the verification burden across verification techniques, or across iterations of alternating monitoring runs as presented in Sect. 5, can be seen as a method for *quotienting*. In [7] Andersen studies quotienting of the specification logics discussed in this paper to reduce the state-space during model checking and thus increase its efficiency (see also [27] for a more recent treatment). The techniques used rely heavily on the model's concurrency constructs and may produce formulas that are larger in size than the original, but which can be checked against a smaller component of the model. In multi-pronged approaches to verification one would expect to encounter similar difficulties occasionally.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Logics for Bisimulation and Divergence**

Xinxin Liu, Tingting Yu(B) , and Wenhui Zhang

State Key Laboratory of Computer Science, Institute of Software, CAS, University of Chinese Academy of Sciences, Beijing, China *{*xinxin,yutt,zwh*}*@ios.ac.cn

**Abstract.** The study of modal logics and various bisimulation equivalences so far shows the following progression: 1. weak bisimilarity is characterized by Hennessy-Milner logic (HML), a simple propositional modal logic with a weak possibility modality, and 2. extending HML by refining the weak possibility modality one obtains a logic which characterizes branching bisimilarity, a refinement of weak bisimilarity, and 3. further extending the logic with a divergence modality one obtains a logic which characterizes branching bisimilarity with explicit divergence, a refinement of branching bisimilarity. In this paper, we explore the development by exchanging the above 2 and 3, i.e. by first extending HML with a divergence modality and then refining the weak possibility modality in the extended logic. We have the following findings: A. extending HML with a new divergence modality one obtains a new logic which characterizes complete weak bisimilarity, an equivalence relation with distinguishing power in between weak bisimilarity and branching bisimilarity with explicit divergence; B. further extending the obtained logic by refining the weak possibility modality in it one obtains another logic which characterizes branching bisimilarity with explicit divergence. As main results of the paper, the logic in A. provides a modal characterization for complete weak bisimilarity, and moreover the two new logics in A. and B. are both sub-logics of the known logic obtained in above 3.

### **1 Introduction**

Weak bisimilarity is a popular equivalence relation introduced by Milner [9]. It is defined through the notion of weak bisimulation which was proposed by Milner [9] based on an idea independently discovered by van Benthem [4] and Park [8]. The importance of weak bisimulation is that it not only defines an equivalence relation but also provides a verification technique for the equality. A well-known theoretical result for weak bisimilarity is that the equivalence is characterized by a modal logic which is known as Hennessy-Milner logic (HML) [2] in the following sense: two processes are equivalent with respect to weak bisimilarity if and only if they satisfy exactly the same set of HML formulas.

Supported by the CAS-INRIA major project No. GJHZ1844.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 221–237, 2018. https://doi.org/10.1007/978-3-319-89366-2\_12

Because weak bisimilarity does not preserve divergence, i.e. it is possible for two equivalent processes that one of them is capable of endless internal computations while the other is not, various divergence preserving versions of weak bisimulation equivalences and pre-orders are studied later [1,3,5,13]. Complete weak bisimilarity is a newly proposed divergence preserving weak bisimulation equivalence [10]. Like weak bisimilarity, complete weak bisimilarity is supported by a bisimulation verification technique called inductive weak bisimulation, which can be very helpful in practical verification that concerns divergence. One of the main aims of this paper is to find a modal logic which characterizes complete weak bisimilarity just as HML characterizes weak bisimilarity.

We will put our study into a more general context. The study of modal logics and various bisimulation equivalences so far shows the following progression which reveals the co-related increase for the expressive power of the logics and the distinguishing power of the equivalences:


In this paper, we explore the development by exchanging the order of 2 and 3, i.e. by first extending HML with a divergence modality and then refining the weak possibility modality in the extended logic. We have the following findings:


To summarize the results of the paper:


The rest of the paper is organized as follows. Section 2 presents the definitions of the equalities, i.e. weak bisimilarity, complete weak bisimilarity, branching bisimilarity, and branching bisimilarity with explicit divergence. Section 3 studies the relationships of the modal logic characterizations of the equalities. Section 4 studies reductions for decision problems concerning finite-state processes. Section 5 concludes.

### **2 Bisimulations and Divergence**

In this section, after settling some necessary preliminaries, we introduce the main equivalence relation, i.e. complete weak bisimilarity, together with some related equivalences like branching bisimilarity and branching bisimilarity with explicit divergence.

**Definition 1** *(*Labeled transition systems*)***.** *A* labeled transition system *(or LTS) is a triple* A = -S, A, −→ *where:*


l ∈ A<sup>∗</sup> be the sequence obtained by deleting all τ 's from l.

We use standard notations for multi-step τ transitions, and the so-called double-arrow transitions: write s =⇒ s if there is a finite τ -run from s to s ; write s <sup>α</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> if there exist t, t such that <sup>s</sup> <sup>=</sup><sup>⇒</sup> t, t <sup>α</sup> −→ t , t =⇒ s . Note the important difference between <sup>s</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> and <sup>s</sup> <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> : the former means that from s to s there is a finite τ -run (could be a τ -run with zero length), while the latter means that from s to s there is a finite τ -run with non-zero length. Thus s =⇒ s holds for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, while <sup>s</sup> <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> holds only when <sup>s</sup> is on a <sup>τ</sup> -loop consisting of one or more <sup>τ</sup> -transitions. Also for <sup>l</sup> <sup>∈</sup> (<sup>A</sup> ∪ {τ})<sup>∗</sup> we will write <sup>s</sup> <sup>l</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> if there is a finite run <sup>ρ</sup> from <sup>s</sup> to <sup>s</sup> with Act(ρ) = <sup>l</sup>. Note that <sup>s</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> means exactly s =⇒ s , where is the empty string.

Next, we review the well-known notions of weak bisimulation, weak bisimilarity [9], and branching bisimulation, branching bisimilarity [12].

**Definition 2** *(*Weak and branching bisimulations*)***.** *Let* A = -S, A, −→ *be an LTS. A binary relation* R ⊆ S × S *is a* weak bisimulation *if it is symmetric and moreover for all* (s, t) ∈ R *the following holds:*

*whenever* s <sup>α</sup> −→ s *, then there exists* t *such that* <sup>t</sup> <sup>α</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> *and* (s , t ) ∈ R*.*

*A binary relation* R ⊆ S × S *is a* branching bisimulation *if it is symmetric and moreover for all* (s, t) ∈ R *the following holds:*

*whenever* s <sup>α</sup> −→ s *, then either* α = τ *, and there exists* t *such that* t =⇒ t *and* (s, t ),(s , t ) ∈ R*, or there exist* t , t *such that* t =⇒ t , t <sup>α</sup> −→ t *and* (s, t ),(s , t) ∈ R*.*

*Now define two relations* ≈, ≈<sup>b</sup> *as follows:*

<sup>≈</sup> <sup>=</sup> {<sup>R</sup> <sup>|</sup> R is a weak bisimulation}, <sup>≈</sup><sup>b</sup> <sup>=</sup> {<sup>R</sup> <sup>|</sup> R is a branching bisimulation}.

The notions of weak and branching bisimulations enjoy some nice properties as stated in the following Lemmas 1 and 2, which then lead to the important Theorem 1 that justifies Definition 2. **Lemma 1.** *If* {R<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>I</sup>} *is a set of weak bisimulations, then* {R<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>I</sup>} 

*is a weak bisimulation. If* {R<sup>i</sup> | i ∈ I} *is a set of branching bisimulations, then* {R<sup>i</sup> | i ∈ I} *is a branching bisimulation.*

For two binary relations R1, R2, we write R<sup>1</sup> · R<sup>2</sup> for the composition of R<sup>1</sup> and R2, i.e. R<sup>1</sup> · R<sup>2</sup> = {(s, t) | ∃u.(s, u) ∈ R1,(u, t) ∈ R2}.

**Lemma 2.** *If* R1, R<sup>2</sup> *are weak bisimulations, then* R<sup>1</sup> · R<sup>2</sup> ∪ R<sup>2</sup> · R<sup>1</sup> *is also a weak bisimulation. If* R1, R<sup>2</sup> *are branching bisimulations, then* R<sup>1</sup> · R<sup>2</sup> ∪ R<sup>2</sup> · R<sup>1</sup> *is also a branching bisimulation.*

The proofs of the above two lemmas directly follow from Definition 2 (Note that we modified the conditions for branching bisimulation as in [11]). With the above two lemmas, it is routine to prove the following theorem, which justifies the definitions of ≈ and ≈b.

**Theorem 1.** ≈ *is an equivalence relation, and it is the largest weak bisimulation.* ≈<sup>b</sup> *is an equivalence relation, and it is the largest branching bisimulation.*

With Theorem 1, ≈ and ≈<sup>b</sup> are usually called *weak bisimilarity* and *branching bisimilarity* respectively.

It is well-known that neither ≈ nor ≈<sup>b</sup> preserves divergence, i.e. it is possible for two states s and t such that s ≈ t while there is an infinite τ -run from s but no infinite τ -run from t.

In order to obtain divergence preserving relations, we can adopt the approach used in [12] by introducing the following definition.

**Definition 3** *(*Weak and branching bisimulation with explicit divergence*)***.** *Let* A = -S, A, −→ *be an LTS. A state* s ∈ S *is said* divergent with respect to *an equivalence relation* ≡*, written* s ⇑≡*, if from* s *there is an infinite* τ *-run* ρ *such that all the states on* ρ *are* ≡*-equivalent to* s*.*

*An equivalence relation* ≡ *on* S *is called a* weak bisimulation with explicit divergence *if* ≡ *is a weak bisimulation and moreover whenever* s ≡ t *it holds that* s ⇑<sup>≡</sup> *if and only if* t ⇑≡*.*

*An equivalence relation* ≡ *on* S *is called a* branching bisimulation with explicit divergence *if* ≡ *is a branching bisimulation and moreover whenever* s ≡ t *it holds that* s ⇑<sup>≡</sup> *if and only if* t ⇑≡*.* <sup>≈</sup> <sup>=</sup> {≡ | ≡ is a weak bisimulation with explicit divergence},

*Now define two relations* ≈, ≈ <sup>b</sup> *as follows:*

≈ <sup>b</sup> <sup>=</sup> {≡ | ≡ is a branching bisimulation with explicit divergence}.

≈ *and* ≈ <sup>b</sup> *are called* weak bisimilarity with explicit divergence *and* branching bisimilarity with explicit divergence *respectively.*

At this point, let us see a non-trivial example of branching bisimulation with explicit divergence. Define ≡sc, the *strongly connected* relation, such that s ≡sc t if and only if s =⇒ t and t =⇒ s. That is s ≡sc t just in case s and t can reach each other by performing τ actions. It only takes a second to check that ≡sc is an equivalence relation. Moreover we have:

**Proposition 1.** ≡sc *is a branching bisimulation with explicit divergence.*

The following lemma is easy to prove.

**Lemma 3.** *If* ≡ *is a weak bisimulation with explicit divergence, then* ≡ *preserves divergence, i.e. whenever* s ≡ t *then there is an infinite* τ *-run from* s *if and only if there is one from* t*.*

With this lemma, we can show that ≈ preserves divergence as follows. If ρ is an infinite τ -run from s and s ≈ t, then there is a weak bisimulation with explicit divergence ≡ such that s ≡ t, then by Lemma 3 there is an infinite τ -run from t, thus ≈ preserves divergence. One is tempting to say that with Lemma 3, ≈ obviously preserves divergence, since ≈ is a weak bisimulation with explicit divergence. However, to apply Lemma 3 in this way, we first have to prove that ≈ is a weak bisimulation with explicit divergence, and at least for the moment we do not know if this is indeed the case.

Thus, as the definitions of ≈ and ≈<sup>b</sup> are justified by Theorem 1, the definitions of ≈ and ≈ <sup>b</sup> also need justification. That is to say we need to confirm that ≈ as defined is indeed the largest weak bisimulation with explicit divergence and, ≈ <sup>b</sup> the largest branching bisimulation with explicit divergence (as it is stated in the definition we even do not know whether ≈ and ≈ <sup>b</sup> are equivalence relations!). But this time the task is not as easy, since we no longer have the corresponding lemmas available as Lemmas 1 and 2 for Theorem 1. As a matter of fact this implies that we do not know whether the notion of weak bisimulation with explicit divergence is a fixed-point of some monotonic functions on the complete lattice of equivalence relations, and hence the Knaster-Tarski fixedpoint theorem is not applicable in this case. Thus we need to find a different way to justify Definition 3. For the time being we have the following obvious lemma, which clarifies the justification task.

**Lemma 4.** ≈ *(*≈ <sup>b</sup> *) is the largest weak (branching) bisimulation with explicit divergence if and only if the largest weak (branching) bisimulation with explicit divergence exists.*

Justification of the definition of ≈ <sup>b</sup> can be found in [13,14], while not in [12] where it was introduced the first time. While a justification for ≈ <sup>b</sup> might be taken as granted, a justification for ≈ may seem to be more necessary. This is because in a weak bisimulation equivalence relation, unlike branching bisimulation, an infinite τ -run from a process may be matched by an infinite τ -run from a related process in a way that the sequences of equivalence classes passed through by the two runs may not be the same. So one needs to be more careful in dealing with ≈. According to Lemma 4, in order to prove that ≈ is a weak bisimulation with explicit divergence we only need to show that the largest weak bisimulation with explicit divergence exists. This approach was taken in [10], where two relations called *complete weak bisimilarity* and *complete branching bisimilarity* were constructed and proved to be the largest weak bisimulation with explicit divergence and largest branching bisimulation with explicit divergence respectively. In this paper, for self containment we will present a justification of the definition of ≈ in the next section, by using the logical characterization result. For the convenience of names, in the paper we will freely use the name of complete weak (branching) bisimilarity as synonym for weak (branching) bisimilarity with explicit divergence.

### **3 Modal Characterization**

The main aims of this section is to look for a modal logic characterization of complete weak bisimilarity ≈, and study its relationship with logic characterizations of other bisimulation equivalences. For that, we first review some of the existing logic characterization results.

In [2] a modal logic, later known as Hennessy-Milner logic (HML), was introduced and proved that two given processes are equivalent under weak bisimularity ≈ if and only if they satisfy the same set of HML formulas. This is the so-called Hennessy-Milner theorem. The key constructor in HML is the weak possibility modality -uF, which asserts that after the observation of u some state with property F is reached. In [6], the weak possibility modality was refined to an *until* modality in the form of F1αF2, meaning that there is a finite τ -run such that all the states on it satisfy F1, and the last state can perform an α action and arrives at a state satisfying F2, and it was proved that the refined logic characterizes branching bisimilarity ≈b, just as HML characterizes weak bisimilarity. In [5] the weak possibility modality was refined to a *just-before* modality in the form of F1{α}F2, meaning that there is a finite τ -run such that the last state satisfies F<sup>1</sup> and can perform an α action and arrives at a state satisfying F2, and it was proved that the refined logic, named Φjb, also characterizes branching bisimilarity ≈b. In [13], Φjb was further extended to the logic Φ jb with a divergence modality in the form of ΔF, meaning that there is an infinite τ -run on which eventually all the states satisfy F, and it was proved that Φ jb characterizes branching bisimilarity with explicit divergence ≈ b .

As the starting point of the work of this paper, we describe a modal logic HMLbΔ which is basically Φ jb with a derived operator -u. The set of formulas of HMLbΔ is defined by the following syntax of BNF rules: F ::=

$$F ::= \bigwedge\_{i \in I} F\_i \big|\neg F \big|F\_1 \{u\} F\_2 \big|\langle\!\langle u \rangle\!\rangle F \big|\Delta F\big|$$

where I is an index set which could be infinite, {u} (with u ∈ A ∪ {}) is the *just-before* modality introduced in [5], -u is the usual *weak possibility* modality as in [9], and Δ is the *divergence* modality introduced in [13].

**Definition 4.** *Let* A = -S, A, −→ *be an LTS. The satisfaction relation* |= *between states and formulas of HMLb*Δ *is defined by induction on the structure of formulas as follows: 1.* s |=


First note that this logic can express some interesting properties of infinite behaviours of processes. For example, Δ**true** asserts the existence of an infinite <sup>τ</sup> -run, where **true** is a short hand for <sup>i</sup>∈∅ <sup>F</sup><sup>i</sup> (which is the first formula of HMLbΔ according to the BNF rules). The logic is basic, however it might be more expressive than one expect due to the use of infinite conjunction with the construction <sup>i</sup>∈<sup>I</sup> <sup>F</sup><sup>i</sup> when <sup>I</sup> is an infinite set. As usual we will write binary conjunction <sup>F</sup><sup>1</sup> <sup>∧</sup>F<sup>2</sup> for

<sup>i</sup>∈{1,2} <sup>F</sup>i, and binary disjunction <sup>F</sup><sup>1</sup> <sup>∨</sup> <sup>F</sup><sup>2</sup> for  <sup>i</sup>∈{1,2} <sup>¬</sup>Fi. For two HMLb<sup>Δ</sup> formulas <sup>F</sup>1, F2, we say that F<sup>1</sup> and F<sup>2</sup> are equivalent logic formulas, written F<sup>1</sup> ⇔ F2, if for any process s of any LTS it holds that s |= F<sup>1</sup> if and only if s |= F2.

The following proposition shows that -u is a derived operator in the sense that it can be defined in terms of the just-before operator {u}.

**Proposition 2.** *For any HMLb*Δ *formula* F *and* a = τ *, the following equivalences hold:*

*1.* --F ⇔ **true**{}F*; 2.* -aF ⇔ **true**{a}(**true**{}F)*.*

**Proof.** Immediately follows from Definition 4.

We write HMLb for the sub-logic of HMLbΔ which consists of formulas constructed without the divergence modality Δ. Then HML, the normal Hennessy-Milner logic, is a sub-logic of HMLb consisting of formulas constructed without the just-before modality {u}. With the result in Proposition 2 that -u is a derived operator of {u}, then the following is a theorem which immediately follows from the characterization result for Φ jb in [13].

**Theorem 2** *(HMLb*Δ *characterization of* ≈ <sup>b</sup> *)***.** *Let* s, t *be two states. Then* s ≈ <sup>b</sup> t *if and only if* s *and* t *satisfy the same set of HMLb*Δ *formulas.*

Likewise, the following is a theorem immediately follows from the characterization result for Φjb in [5].

**Theorem 3** *(HMLb characterization of* ≈b*)***.** *Let* s, t *be two states. Then* s ≈<sup>b</sup> t *if and only if* s *and* t *satisfy the same set of HMLb formulas.*

The following is the famous Hennessy-Milner theorem, which can be found in Chap. 10 of [9].

**Theorem 4** *(HML characterization of* ≈*)***.** *Let* s, t *be two states. Then* s ≈ t *if and only if* s *and* t *satisfy the same set of HML formulas.*

The last three theorems give modal logic characterizations for ≈ <sup>b</sup> , ≈<sup>b</sup> and ≈ respectively, still missing is a modal logic characterization for ≈. Considering that HMLb is the extension of HML by the just-before modality and that HMLbΔ is the extension of HML by the just-before *and* the divergence modality, an obvious attempt is to extend HML with the divergence modality and hopefully that will give us a logic which characterizes ≈. However it turns out that the divergence construction ΔF is not preserved by ≈, as the following example shows.

*Example 1.* The drawing shows an LTS P = -S, A, −→ where A = {a<sup>i</sup> | i ≥ 0}, S = {s<sup>i</sup> | i ≥ 0}∪{t<sup>i</sup> | i ≥ 0}, and the transition relation is as follows:


Now define ≡ to be the following relation:

$$\{(s\_i, s\_i) \mid i \ge 0\} \cup \{(t\_i, t\_i) \mid i \ge 0\} \cup \{(s\_i, t\_i) \mid i \ge 0\} \cup \{(t\_i, s\_i) \mid i \ge 0\}.$$

The following facts about ≡ are easy to verify:


Thus ≡ is a weak bisimulation with explicit divergence, and s<sup>0</sup> ≈ t0. In the following we show that there is an HML formula F such that ΔF is satisfied by s<sup>0</sup> and not by t0.

Let F<sup>k</sup> be the following formula:

$$(\langle\!\langle a\_{2k}\rangle\!\rangle \mathtt{true} \land \langle\!\langle a\_{2k+1}\rangle\!\rangle \mathtt{true}) \lor (\neg\langle\!\langle a\_{2k}\rangle\!\rangle \mathtt{true} \land \neg\langle\!\langle a\_{2k+1}\rangle\!\rangle \mathtt{true}).$$

That is, F<sup>k</sup> asserts that the pair of actions a2<sup>k</sup> and a2k+1 are either both enabled or both disabled. It is clear that F<sup>k</sup> holds for every state of S except <sup>s</sup>2k+1 and <sup>t</sup>2k+1. Thus {F<sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>≥</sup> <sup>0</sup>} holds on every even numbered position (i.e. s0, t0, s2, t2,...) while does not hold on every odd numbered position (i.e. s1, t1, s3, t3,...). Now <sup>Δ</sup> {F<sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>≥</sup> <sup>0</sup>} is satisfied by <sup>s</sup><sup>0</sup> but not by <sup>t</sup>0. To see that, note that

from s<sup>0</sup> there is an infinite τ -run σ = s0τs2τ...s<sup>2</sup>kτ... and every state on σ satisfies {F<sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>≥</sup> <sup>0</sup>}, while the only infinite <sup>τ</sup> -run from <sup>t</sup><sup>0</sup> is <sup>t</sup>0τ t1τ..., on which there are infinitely many states that do not satisfy {F<sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>≥</sup> <sup>0</sup>}.

Thus, we need to find a different divergence modality. For that we introduce the *weak divergence modality* Δ into HMLbΔ, by extending the BNF rules as follows:

$$F ::= \dots \mid \Delta\_{\epsilon} F.$$

And then add the following interpretation into Definition 4.

6. s |= ΔF if there is an infinite τ -run σ from s such that for every state s on σ it holds that s =⇒ t for some t |= F.

The following is a depiction of the condition for s |= ΔF.

**Proposition 3.** *For any HMLb*Δ *formula* F*, the following equivalence holds:*

$$
\Delta\_{\epsilon} F \Leftrightarrow \Delta \langle \!\!\langle \epsilon \!\rangle \!\!\rangle F.
$$

**Proof.** Immediately follows from Definition 4 together with the above interpretation for ΔF.

This proposition shows that Δ is a derived operator of Δ and --, and that with Δ added into HMLbΔ the expressiveness of the extended logic does not increase. So we still call the logic HMLbΔ after extending with Δ, and we write HMLΔ for the sub-logic where the only modalities allowed are the weak possibility modality -u and the weak divergence modality Δ. With the new divergence modality we can obtain another sub-logic HMLbΔ in which Δ is allowed but not Δ.

Given a sub-logic L of HMLbΔ, it induces an equivalence relation ≡<sup>L</sup> on states such that s ≡<sup>L</sup> t if and only if s and t satisfy the same set of formulas in the sub-logic. We call ≡<sup>L</sup> the equivalence induced by L. The following is a summary of the sub-logics of HMLbΔ that we concerned about and the corresponding induced equivalences:


In the rest of this section we will show that HMLΔ characterizes ≈, i.e. ≈ coincides with ≡- <sup>w</sup> . To prove ≈⊆≡- <sup>w</sup> , we show that for every weak bisimulation with explicit divergence ≡ it holds that ≡⊆≡- <sup>w</sup> (Lemma 5). To prove ≡- <sup>w</sup> ⊆≈, we show that ≡- <sup>w</sup> is a weak bisimulation with explicit divergence (Lemma 8).

Example 1 shows what ΔF is not preserved by ≈, while the following lemma guarantees that ΔF is preserved by ≈. Here we omit the proof.

**Lemma 5.** *Let* ≡ *be a weak bisimulation with explicit divergence,* F *be an HML*Δ *formula. If* s ≡ t *and* s |= F*, then* t |= F*. Thus if* ≡ *is a weak bisimulation with explicit divergence then* ≡⊆≡- <sup>w</sup> *.*

**Lemma 6.** *Let* s =⇒ t*. Then*


**Proof.** We only prove 3. With the similar idea we can prove 1 and 2.

Suppose t |= ΔF. Thus from t there is an infinite τ -run ρ such that for each state t on ρ there exists t with t =⇒ t and t |= F. Now since s =⇒ t, by adding a prefix to ρ we can easily obtain an infinite run ρ with starting state s such that for each state t on ρ there exists t with t =⇒ t and t |= F, hence s |= ΔF.

The following is the so-called *stuttering lemma* for ≡- <sup>w</sup> .

**Lemma 7.** *If* s =⇒ s , s =⇒ t, *and* s ≡- <sup>w</sup> t *then* s ≡- <sup>w</sup> s *.*

**Proof.** In this case we only need to prove the following: for any HMLΔ formula F, it holds that s |= F if and only if s |= F. We carry out the proof by induction on the structure of F. For <sup>i</sup>∈<sup>I</sup> <sup>F</sup>i, we have the following sequence of equivalences: <sup>s</sup> <sup>|</sup><sup>=</sup>

<sup>i</sup>∈<sup>I</sup> <sup>F</sup><sup>i</sup> iff s |= F<sup>i</sup> for every i ∈ I (by definition of |=) iff s |= F<sup>i</sup> for every i ∈ I (by induction hypothesis) iff <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>i</sup>∈<sup>I</sup> <sup>F</sup><sup>i</sup> (by definition of <sup>|</sup>=). In the same way we can prove it for the case ¬F.

For -uF, suppose s |= -uF. Then t |= -uF by s ≡- <sup>w</sup> t, then it immediately follows that s |= -uF by s =⇒ t and Lemma 6. On the other hand, suppose s |= -uF, then s |= -uF immediately follows by s =⇒ s and Lemma 6. In the same way we can prove it for the case ΔF.

**Lemma 8.** ≡- <sup>w</sup> *is a weak bisimulation with explicit divergence.*

**Proof.** To prove that ≡- <sup>w</sup> is a weak bisimulation with explicit divergence, we need to establish the following:


It is obvious that ≡- <sup>w</sup> is an equivalence relation. The way to prove that ≡- w is a weak bisimulation is exactly the same as the way to prove that ≡<sup>w</sup> is a weak bisimulation [9]. We prove 3. in the following.

First, let us note that for a pair of states s, t with s ≡- <sup>w</sup> t, by the definition of ≡- <sup>w</sup> there exists an HMLΔ formula F<sup>s</sup> <sup>t</sup> , which is often called a *distinguishing formula* of <sup>s</sup> and <sup>t</sup>, such that <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>s</sup> <sup>t</sup> and <sup>t</sup> |<sup>=</sup> <sup>F</sup><sup>s</sup> t .

Suppose s ≡- <sup>w</sup> <sup>t</sup>, and <sup>s</sup> ⇑≡- <sup>w</sup> , then there is an infinite τ -run ρ from s with all the states on it ≡- <sup>w</sup> -equivalent to s. We construct the following formula F<sup>s</sup> 

$$\bigwedge \{ F\_u^s \mid t \stackrel{\tau}{\Longrightarrow} u, u \not\equiv\_w^{\triangle\_\*} s \}.$$

Clearly <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>s. Moreover <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup>Fs, since for any state <sup>s</sup> on <sup>ρ</sup>, there is <sup>s</sup> such that <sup>s</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> and <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>s</sup> (just take <sup>s</sup> to be <sup>s</sup> , thus s =⇒ s , and <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>s</sup> by s ≡- <sup>w</sup> s). Now because t ≡- <sup>w</sup> <sup>s</sup>, thus <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup>Fs. In the following we will show that <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup>F<sup>s</sup> implies <sup>t</sup> ⇑≡-w .

Since <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup>Fs, there is an infinite <sup>τ</sup> -run <sup>σ</sup> from <sup>t</sup> such that for any state <sup>t</sup> on σ there exists t with t =⇒ t and t <sup>|</sup><sup>=</sup> <sup>F</sup>s. Now we will show that if <sup>t</sup> is a state on ρ then t ≡- <sup>w</sup> t.

Note that the construction of F<sup>s</sup> guarantees the following property:

if t =⇒ t and t <sup>|</sup><sup>=</sup> <sup>F</sup><sup>s</sup> then <sup>t</sup> ≡- <sup>w</sup> t.

To see that, let <sup>t</sup> <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> . Suppose t ≡- <sup>w</sup> t, then t ≡- <sup>w</sup> s, which implies t |<sup>=</sup> <sup>F</sup><sup>s</sup> because in this case F<sup>s</sup> <sup>t</sup> , which is a distinguishing formula of s and t , is one of the conjuncts of F<sup>s</sup>, and t |<sup>=</sup> <sup>F</sup><sup>s</sup> t .

Now for any state t on σ, since t =⇒ t and t =⇒ t for some t with t <sup>|</sup><sup>=</sup> <sup>F</sup><sup>s</sup>, and by the above property of <sup>F</sup><sup>s</sup> we know that <sup>t</sup> ≡- <sup>w</sup> t, then by Lemma 7 t ≡- <sup>w</sup> t, thus σ is the infinite τ -run that we are looking for. At last, we can state the modal characterization theorem for ≈.

**Theorem 5.** *(HML*Δ *characterization of* ≈*)* ≡- <sup>w</sup> *coincides with* ≈*, that is for any pair of states* s *and* t*,* s ≈ t *if and only if* s *and* t *satisfy the same set of HML*Δ *formulas.*

**Proof.** By Lemma 5, ≈⊆≡- <sup>w</sup> , and by Lemma 8 ≡- <sup>w</sup> is a weak bisimulation with explicit divergence, hence ≡- <sup>w</sup> ⊆≈.

And at the same time we obtain the following theorem which justifies the definition of ≈.

**Theorem 6.** ≈ *is a weak bisimulation with explicit divergence, and it is the largest weak bisimulation with explicit divergence.*

**Proof.** By Lemmas 5 and 8, ≡- <sup>w</sup> is the largest weak bisimulation with explicit divergence. By Theorem 5 ≈ is the same as ≡- <sup>w</sup> , hence ≈ is the largest weak bisimulation with explicit divergence.

Perhaps a little surprise is the following new modal characterization result for branching bisimilarity with explicit divergence ≈ b .

**Theorem 7** *(HMLb*Δ *characterization of* ≈ <sup>b</sup> *)***.** *Let* s, t *be two states. Then* s ≈ <sup>b</sup> t *if and only if* s *and* t *satisfy the same set of HMLb*Δ *formulas.*

**Proof.** Here we give the following sketch.

Suppose s ≈ <sup>b</sup> t and s |= F for some HMLbΔ formula F, just note that by Proposition 3 there is an HMLbΔ formula F with F ⇔ F, then s |= F and by Theorem 2 t |= F thus t |= F.

For the other direction, we prove that ≡- <sup>b</sup> is a branching bisimulation with explicit divergence. We can prove that ≡- <sup>b</sup> is a branching bisimulation in the same way to prove that ≡<sup>b</sup> is a branching bisimulation as the proof of Theorem 3 in [5]. Suppose s ≡- <sup>b</sup> t and there is an infinite τ -run from s with all the states on the run in the same ≡- <sup>b</sup> -equivalence class of s, we can prove that there is an infinite τ -run from t with all the states on the run in the same ≡- <sup>b</sup> -equivalence class of t as we prove it for ≡- <sup>w</sup> in Lemma 8, with the help of a lemma similar to Lemma 7 with ≡- <sup>b</sup> in place of ≡- <sup>w</sup> .

By Theorems 2 and 7, HMLbΔ and HMLbΔ both characterize ≈ <sup>b</sup> . Now the results about the relationships of various bisimulation equivalence relations and the logics can be summarized as the above lattice shaped diagrams, where on the left the equality on the higher end of an edge is included in the equality on the lower end of the edge, and on the right the logic on the lower end of an edge is a sub-logic of the one on the higher end of the edge, and the dotted lines represent the logic characterization results.

### **4 Divergence in Finite State Systems**

The motivating problem of this section is the problem of checking complete weak bisimilarity for finite-state processes:

given an LTS -S, A, −→ and two states s, t ∈ S, where S and A are finite sets, decide whether s ≈ t.

We will show that this problem can be solved by reducing it to the problem of checking weak bisimilarity for finite-state processes which can be solved by a well-known partition algorithm [7]:

given an LTS -S, A, −→ and two states s, t ∈ S, where S and A are finite sets, decide whether s ≈ t.

The reduction is as follows. Let P = -S, A, −→ be a finite-state labeled transition system, i.e. both S and A are finite sets, δ be an action not in A. Then we can construct a new finite-state LTS P<sup>δ</sup> = -S, - A, - −→ where S- = {sˆ| s ∈ S}, A- = A ∪ {δ}, −→ = {(ˆs, α, sˆ ) <sup>|</sup> <sup>s</sup> <sup>α</sup> −→ s }∪{(ˆs, δ, <sup>s</sup>ˆ) <sup>|</sup> <sup>s</sup> <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup>}.

The idea of the reduction is pretty straightforward: in a finite-state system, the existence of an infinite τ -run from a state s is equivalent to the existence of a so-called *looping state* <sup>s</sup> such that <sup>s</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> and <sup>s</sup> <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> , and then the looping states can be marked by a particular new action δ. Thus the transitions of the constructed system P<sup>δ</sup> is like the original system P except that every looping state s is indicated by a new transition ˆs <sup>δ</sup> −→ sˆ. In the following when there will cause no confusion we will simply write ˆs <sup>α</sup> −→ t ˆ instead of ˆs <sup>α</sup> −→ t <sup>ˆ</sup> for s, t <sup>∈</sup> <sup>S</sup>.

Now to complete the reduction, we will show that for any s, t ∈ S, it holds that s ≈ t if and only if ˆs ≈ t ˆ. Then in order to check whether <sup>s</sup> <sup>≈</sup> <sup>t</sup> we only need to check whether ˆs ≈ t ˆ. For any s, t <sup>∈</sup> <sup>S</sup>, in order to show that <sup>s</sup> <sup>≈</sup> <sup>t</sup> if and only if ˆs ≈ t ˆ, we can show that ≡⊆ <sup>S</sup> <sup>×</sup> <sup>S</sup> is a weak bisimulation with explicit divergence if and only if <sup>≡</sup>- = {(ˆs,t ˆ) <sup>|</sup> <sup>s</sup> <sup>≡</sup> <sup>t</sup>} is a weak bisimulation. However, with the logic characterization results of the last section, here we will take a different approach which reveals essential properties of the reduction construction as stated in the following Theorems 8 and 9 and allows us to obtain more general results as stated in the following Theorem 10.

We define a translation function which maps every HMLbΔ formula F to another HMLbΔ formula F. The function is inductively defined on the structure of the formula as follows: i∈I Fi =

$$\begin{array}{c|c} \hline \overline{\bigwedge\_{i \in I} F\_i} = \bigwedge\_{i \in I} \overline{F\_i} & \neg \overline{F} = \neg \overline{F} \\ \hline F\_1 \{u\} F\_2 = \overline{F\_1} \{u\} \overline{F\_2} & (u \neq \delta) \\ \hline \langle u \rangle \overline{F} = \langle u \rangle \overline{F} & (u \neq \delta) \\ \hline \Delta \overline{F} = \mathtt{true} \{\delta\} \overline{F} & \\ \hline \end{array} \\ \begin{array}{c} \hline \overline{-\overline{F}} = \neg \overline{F} \\ \hline \overline{F\_1} \{\delta\} \overline{F\_2} = \neg \mathsf{true} \\ \hline \langle \delta \rangle \overline{F} = \neg \mathsf{true} \\ \hline \Delta\_\epsilon \overline{F} = \langle \langle \delta \rangle \rangle \overline{F} \\ \hline \end{array}$$

**Theorem 8.** *If* F *is an HMLb*Δ *formula, then* F *is an HMLb formula. Moreover if* F *is an HML*Δ *formula, then* F *is an HML formula.* S, A, -

*For a finite-state LTS* P = -S, A, −→*, let* P<sup>δ</sup> = - −→ *be the finitestate LTS constructed above,* s ∈ S*. Then for any HMLb*Δ *formula* F*, it holds that* s |= F *if and only if* sˆ |= F*.*

The proof, which is omitted here, is a routine induction on the structure of the formulas. Here we just explain the idea behind the translation function from which one can see the rationale behind Theorem 8. The key is to understand why F1{δ}F<sup>2</sup> is translated to ¬**true**. As we have pointed out above, δ is an action which is not in A and which is used in the reduction to mark divergence. That implies that any process s from P is not capable of an δ action, hence the property F1{δ}F<sup>2</sup> will never be satisfied by any process from P. That is why F1{δ}F<sup>2</sup> is translated to ¬**true**. For the same reason -δF is also translated to ¬**true**.

Also, we can define a translation function which maps every HMLb formula F to an HMLbΔ formula F. The function is inductively defined on the structure of the formula as follows: i∈I Fi =

$$\begin{array}{c|c} \bigwedge\_{i \in I} F\_i = \bigwedge\_{i \in I} \underline{F\_i} & \\ \hline F\_1 \{u\} F\_2 = \underline{F\_1} \{u\} \underline{F\_2} & (u \neq \delta) \\ \hline \langle u \rangle \underline{\varPi} = \langle \langle u \rangle \rangle \underline{\varPi} & (u \neq \delta) \end{array} \quad \begin{array}{c} \neg \underline{F} = \neg \underline{F} \\ \hline F\_1 \{\delta\} F\_2 = \Delta (F\_1 \wedge F\_2) \\ \hline \langle \delta \rangle \underline{\varPi} = \Delta\_\epsilon \underline{F} \end{array}$$

**Theorem 9.** *If* F *is an HMLb formula, then* F *is an HMLb*Δ *formula. Moreover if* F *is an HML formula, then* F *is an HML*Δ *formula.* S, A, -

*For a finite-state LTS* P = -S, A, −→*, let* P<sup>δ</sup> = - −→ *be the finitestate LTS constructed above,* s ∈ S*. Then for any HMLb formula* F*, it holds that* s |= F *if and only if* sˆ |= F*.*

Now we obtain the following theorem which guarantees the correctness of our reduction. S, A, -

**Theorem 10.** *For a finite-state LTS* P = -S, A −→*, let* P<sup>δ</sup> = - −→ *be the finite-state LTS constructed above. Then for* s, t ∈ S*:*

*1.* s ≈ t *if and only if* sˆ ≈ t ˆ*; 2.* s ≈ <sup>b</sup> t *if and only if* sˆ ≈<sup>b</sup> t ˆ*.*

**Proof.** Here we only prove 1. The way to prove 2. is the same. Since ≈ coincides with ≡- <sup>w</sup> and ≈ coincides with ≡w, to prove 1. we only need to prove that s ≡- <sup>w</sup> t if and only if ˆs ≡<sup>w</sup> t ˆ.

Suppose s ≡- <sup>w</sup> t. If ˆs |= F for some HML formula F, then by Theorem 9, F is an HMLΔ formula and s |= F. Then by the condition that s ≡- <sup>w</sup> t, we have t |= F, and again by Theorem 9, t <sup>ˆ</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>. Thus ˆ<sup>s</sup> <sup>≡</sup><sup>w</sup> <sup>t</sup> ˆ.

Suppose ˆs ≡<sup>w</sup> t ˆ. If <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> for some HMLΔ formula <sup>F</sup>, then by Theorem 8, F is an HML formula and ˆs |= F. Then by the condition that ˆs ≡ t ˆ, we have t <sup>ˆ</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>, and again by Theorem 8, <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>. Thus <sup>s</sup> <sup>≡</sup>- <sup>w</sup> t.

Theorem 8 also suggests a simple solution to the model checking problem for HMLbΔ (which can have many solutions). The model checking problem here is to ask, for any given state s of a fnite-state LTS P and any given finite HMLbΔ formula F (finite in the sense that only finite conjunctions are allowed in F), how to decide whether s |= F holds or not. By Theorem 8, this problem can be reduced to the problem of deciding if ˆs |= F holds or not, which comes with simple decision procedures because here ˆs is a state in the finite-state LTS P<sup>δ</sup> and F is a finite HMLb formula.

### **5 Conclusion**

To summarize, by introducing a new divergence modality, the weak divergence modality Δ, we obtain logic characterization results for two divergence sensitive bisimulation equivalence relations. One is the first modal logic characterization for complete weak bisimilarity ≈, and the other is a new modal logic characterization for branching bisimilarity with explicit divergence ≈ <sup>b</sup> . With these new characterization results we showed a clear picture of the sub-logic relationships of various logic characterization results. By using these new characterization results, we provide reductions from the divergence sensitive equality checking problems and model checking problems to the divergence blind equality checking problems and model checking problems respectively for finite-state systems.

Complete weak bisimilarity ≈ was first defined in [10], which is a refinement of weak bisimilarity ≈ [9] by taking divergence behavior into account. Since this is a relatively new equivalence relation, the logic characterization problem and equality checking problem for finite-state systems have not been treated before this paper. The relation ≈ <sup>b</sup> was defined in [12] which is a refinement of branching bisimilarity ≈<sup>b</sup> [12]. In [15], the equality checking problem of *stutter equivalence* on Kripke structures is solved by a reduction to the equality checking problem of *divergence blind* stutter equivalence problem. Stutter equivalence and divergence blind stutter equivalence are the Kripke structure versions of branching bisimilarity with explicit divergence and branching bisimilarity. The reduction presented in Sect. 4 is inspired by the reduction in [15].

The study of modal logic characterization of bisimulation equivalence relations was initiated by Hennessy and Milner in [2]. For branching bisimilarity, modal characterization results were studied in [5,6], where different modalities for branching structures were used. In [6], besides the extension of Hennessy-Milner logic with the until operator mentioned earlier in the paper, two other logics were proposed to characterize branching bisimilarity. One is another extension of Hennessy-Milner logic which exploits the power of backward modalities. The other is CTL<sup>∗</sup> without the next-time operator interpreted over all paths, not just over maximal ones. In [13] a modal logic was proposed to characterize branching bisimilarity with explicit divergence by combining modalities for branching bisimilarity in [5] and a divergence modality Δ. In [14], an extension of CTL<sup>∗</sup> without the next operator is proposed which also characterizes branching bisimilarity with explicit divergence.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Lambda-Calculi and Types

### **Call-by-Need, Neededness and All That**

Delia Kesner<sup>1</sup>, Alejandro R´ıos<sup>2</sup>, and Andr´es Viso2,3(B)

 IRIF, CNRS and Univ. Paris-Diderot, Paris, France Universidad de Buenos Aires, Buenos Aires, Argentina aeviso@dc.uba.ar CONICET, Buenos Aires, Argentina

**Abstract.** We show that call-by-need is observationally equivalent to weak-head needed reduction. The proof of this result uses a semantical argument based on a (non-idempotent) intersection type system called V. Interestingly, system V also allows to syntactically identify all the weakhead needed redexes of a term.

### **1 Introduction**

One of the fundamental notions underlying this paper is the one of *needed reduction* in λ-calculus, which is to be used here to understand (lazy) evaluation of functional programs. Key notions are those of reducible and non-reducible programs: the former are programs (represented by λ-terms) containing nonevaluated subprograms, called reducible expressions (redexes), whereas the latter can be seen as definitive results of computations, called normal forms. It turns out that every reducible program contains a special kind of redex known as needed or, in other words, every λ-term not in normal form contains a needed redex. A redex r is said to be *needed* in a λ-term t if r has to be contracted (*i.e.* evaluated) sooner or later when reducing t to *normal form*, or, informally said, if there is no way of avoiding r to reach a normal form.

The needed strategy, which always contracts a needed redex, is normalising [8], *i.e.* if a term can be reduced (in any way) to a normal form, then contraction of needed redexes necessarily terminates. This is an excellent starting point to design an evaluation strategy, but unfortunately, neededness of a redex is not decidable [8]. As a consequence, real implementations of functional languages cannot be directly based on this notion.

Our goal is, however, to establish a clear connection between the semantical notion of neededness and different implementations of lazy functional languages (*e.g.* Miranda or Haskell). Such implementations are based on *call-by-need calculi*, pioneered by Wadsworth [20], and extensively studied *e.g.* in [3]. Indeed, call-by-need calculi fill the gap between the well-known operational semantics of the call-by-name λ-calculus and the actual implementations of lazy functional languages. While call-by-name re-evaluates an argument each time it is used –an

This work was partially founded by LIA INFINIS.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 241–257, 2018. https://doi.org/10.1007/978-3-319-89366-2\_13

operation which is quite expensive– call-by-need can be seen as a *memoized* version of call-by-name, where the value of an argument is stored the first time it is evaluated for subsequent uses. For example, if t = Δ (I I), where Δ = λx.x x and I = λz.z, then call-by-name duplicates the argument I I, while lazy languages first reduce I I to the value I so that further uses of this argument do not need to evaluate it again.

While the notion of needed reduction is defined with respect to (full strong) *normal forms*, call-by-need calculi evaluate programs to special values called *weak-head normal forms*, which are either abstractions or arbitrary applications headed by a variable (*i.e.* terms of the form x t<sup>1</sup> ...t<sup>n</sup> where t<sup>1</sup> ...t<sup>n</sup> are arbitrary terms). To overcome this shortfall, we first adapt the notion of needed redex to terms that are not going to be fully reduced to *normal forms* but only to *weakhead normal forms*. Thus, informally, a redex r is *weak-head needed* in a term t if r has to be contracted sooner or later when reducing t to a weak-head normal form. The derived notion of strategy is called a *weak-head needed strategy*, which always contracts a weak-head needed redex.

This paper introduces two independent results about weak-head neededness, both obtained by means of (non-idempotent) intersection types [12,13] (a survey can be found in [9]). We consider, in particular, typing system V [14] and show that it allows to identify all the weak-head needed redexes of a weak-head normalising term. This is done by adapting the classical notion of *principal type* [17] and proving that a redex in a weak-head normalising term t is weak-head needed iff it is typed in a principally typed derivation for t in V.

Our second goal is to show observational equivalence between call-by-need and weak-head needed reduction. Two terms are observationally equivalent when all the empirically testable computations on them are identical. This means that a term t can be evaluated to a weak-head normal form using the call-by-need machinery if and only if the weak-head needed reduction normalises t.

By means of system V mentioned so far we use a technique to reason about observational equivalence that is flexible, general and easy to verify or even certify. Indeed, system V provides a semantic argument: first showing that a term t is typable in system V iff it is normalising for the weak-head needed strategy (t ∈ WN whnd), then by resorting to some results in [14], showing that system V is complete for call-by-name, *i.e.* a term t is typable in system V iff t is normalising for call-by-name (t ∈ WN name); and that t is normalising for call-by-name iff t is normalising for call-by-need (t ∈ WN need). Thus completing the following chain of equivalences:

t ∈ WN whnd t typable in V t ∈ WN name t ∈ WN need

This leads to the observational equivalence between call-by-need, call-byname and weak-head needed reduction.

*Structure of the paper* : Sect. 2 introduces preliminary concepts while Sect. 3 defines different notions of needed reduction. The type system V is studied in Sect. 4. Section 5 extends β-reduction to derivation trees. We show in Sect. 6 how system V identifies weak-head needed redexes, while Sect. 7 gives a characterisation of normalisation for the weak-head needed reduction. Sect. 8 is devoted to define call-by-need. Finally, Sect. 9 presents the observational equivalence result.

### **2 Preliminaries**

This section introduces some standard definitions and notions concerning the reduction strategies studied in this paper, that is, call-by-name, head and weakhead reduction, and neededness, this later notion being based on the *theory of residuals* [7].

#### **2.1 The Call-by-Name Lambda-Calculus**

Given a countable infinite set X of variables x, y, z, . . . we consider the following grammar:

```
(Terms) t, u ::= x ∈X | t u | λx.t
         (Values) v ::= λx.t
      (Contexts) C ::= -
                           | C t | t C | λx.C
(Name contexts) E ::= -
                           | E t
```
The set of λ-terms is denoted by T<sup>a</sup>. We use I, K and Ω to denote the terms λx.x, λx.λy.x and (λx.x x) (λx.x x) respectively. We use <sup>C</sup>t (resp. <sup>E</sup>t) for the term obtained by replacing the hole of C (resp. E) by t. The sets of *free* and *bound variables* of a term t, written respectively fv(t) and bv(t), are defined as usual [7]. We work with the standard notion of α-*conversion*, *i.e.* renaming of bound variables for abstractions; thus for example λx.x y =<sup>α</sup> λz.z y.

A term of the form (λx.t) u is called a β-*redex* (or just *redex* when β is clear from the context) and λx is called the *anchor* of the redex. The *onestep reduction relation* →<sup>β</sup> (resp. →name) is given by the closure by contexts <sup>C</sup> (resp. <sup>E</sup>) of the rewriting rule (λx.t) <sup>u</sup> →<sup>β</sup> <sup>t</sup> {<sup>x</sup> /u}, where { / } denotes the capture-free standard higher-order substitution. Thus, call-by-name forbids reduction inside arguments and λ-abstractions, *e.g.* (λx.II) (II) →<sup>β</sup> (λx.II) I and (λx.II) (II) →<sup>β</sup> (λx.I) (II) but neither (λx.II) (II) →name (λx.II) I nor (λx.II) (II) →name (λx.I) (II) holds. We write <sup>β</sup> (resp. name) for the reflexivetransitive closure of →<sup>β</sup> (resp. →name).

#### **2.2 Head, Weak-Head and Leftmost Reductions**

In order to introduce different notions of reduction, we start by formalising the general mechanism of reduction which consists in contracting a redex at some specific occurrence. *Occurrences* are finite words over the alphabet {0, <sup>1</sup>}. We use to denote the empty word and notation <sup>a</sup><sup>n</sup> for <sup>n</sup> <sup>∈</sup> <sup>N</sup> concatenations of some letter a of the alphabet. The set of *occurrences* of a given term is defined by induction as follows: oc(x) *def* <sup>=</sup> {}; oc(t u) *def* <sup>=</sup> {}∪{0p <sup>|</sup> <sup>p</sup> <sup>∈</sup> oc(t)} ∪ {1p <sup>|</sup> <sup>p</sup> <sup>∈</sup> oc(u)}; oc(λx.t) *def* <sup>=</sup> {}∪{0p <sup>|</sup> <sup>p</sup> <sup>∈</sup> oc(t)}.

Given two occurrences <sup>p</sup> and <sup>q</sup>, we use the notation <sup>p</sup> <sup>≤</sup> <sup>q</sup> to mean that <sup>p</sup> is a *prefix* of <sup>q</sup>, *i.e.* there is <sup>p</sup> such that pp <sup>=</sup> <sup>q</sup>. We denote by <sup>t</sup>|<sup>p</sup> the *subterm of* <sup>t</sup> *at occurrence*p, defined as expected [4], thus for example ((λx.y) <sup>z</sup>)|<sup>00</sup> <sup>=</sup> <sup>y</sup>. The set of *redex occurrences* of t is defined by roc(t) *def* <sup>=</sup> {<sup>p</sup> <sup>∈</sup> oc(t) <sup>|</sup> <sup>t</sup>|<sup>p</sup> = (λx.s) <sup>u</sup>}. We use the notation <sup>r</sup> : <sup>t</sup> <sup>→</sup><sup>β</sup> <sup>t</sup> to mean that <sup>r</sup> <sup>∈</sup> roc(t) and <sup>t</sup> reduces to <sup>t</sup> by *contracting* the redex at occurrence <sup>r</sup>, *e.g.* <sup>000</sup> : (λx.(λy.y) x x) <sup>z</sup> <sup>→</sup><sup>β</sup> (λx.x x) <sup>z</sup>. This notion is extended to reduction sequences as expected, and noted ρ : t <sup>β</sup> t , where ρ is the list of all the redex occurrences contracted along the reduction sequence. We use *nil* to denote the empty reduction sequence, so that *nil* : t <sup>β</sup> t holds for every term t.

Any term t has exactly one of the following forms: λx1.. . . λxn.y t<sup>1</sup> ...t<sup>m</sup> or λx1.. . . λxn.(λy.s) u t<sup>1</sup> ...t<sup>m</sup> with n, m ≥ 0. In the latter case we say that (λy.s) u is the *head redex* of t, while in the former case there is no head redex. Moreover, if n = 0, we say that (λy.s) u is the *weak-head redex* of t. In terms of occurrences, the *head redex* of t is the *minimal* redex occurrence of the form <sup>0</sup><sup>n</sup> with <sup>n</sup> <sup>≥</sup> 0. In particular, if it satisfies that <sup>t</sup>|0k is not an abstraction for every <sup>k</sup> <sup>≤</sup> <sup>n</sup>, it is the *weak-head redex* of <sup>t</sup>. A reduction sequence contracting at each step the head redex (resp. weak-head redex) of the corresponding term is called the *head reduction* (resp. *weak-head reduction*).

Given two redex occurrences r,r <sup>∈</sup> roc(t), we say that <sup>r</sup> is *to-the-left of* r if the anchor of r is to the left of the anchor of r . Thus for example, the redex occurrence 0 is to-the-left of 1 in the term (I x) (I y), and is to-the-left of 00 in (λx.(I I)) z. Alternatively, the relation *to-the-left* can be understood as a dictionary order between redex occurrences, *i.e.* r is *to-the-left of* r if either r <sup>=</sup> rq with <sup>q</sup> = (*i.e.* r is a proper prefix of r ); or r = p0q and r = p1q (*i.e.* they share a common prefix and r is on the left-hand side of an application while r is on the right-hand side). Notice that in any case this implies r <sup>≤</sup> <sup>r</sup>. Since this notion defines a total order on redexes, every term not in normal form has a unique *leftmost redex*. The term t *leftmost reduces* to t if t reduces to t and the reduction step contracts the leftmost redex of t. For example, (I x) (I y) leftmost reduces to x (I y) and (λx.(I I)) z leftmost reduces to I I. This notion extends to reduction sequences as expected.

### **3 Towards Neededness**

Needed reduction is based on two fundamental notions: that of residual, which describes how a given redex is traced all along a reduction sequence, and that of normal form, which gives the form of the expected result of the reduction sequence. This section extends the standard notion of needed reduction [8] to those of head and weak-head needed reductions.

#### **3.1 Residuals**

Given a term <sup>t</sup>, <sup>p</sup> <sup>∈</sup> oc(t) and <sup>r</sup> <sup>∈</sup> roc(t), the *descendants of* <sup>p</sup> *after* <sup>r</sup> *in* <sup>t</sup>, written p/r, is the set of *occurrences* defined as follows:

$$\begin{array}{l} \begin{array}{l} \mathcal{Q} \text{ if } \mathfrak{p} = \mathfrak{r} \text{ or } \mathfrak{p} = \mathfrak{r} \mathbf{0} \\ \{\mathfrak{p}\} \text{ if } \mathfrak{r} \not\subset \mathfrak{p} \\ \{\mathsf{rq}\} \text{ if } \mathfrak{p} = \mathsf{r} \mathbf{0} \mathbf{0} \mathbf{q} \\ \{\mathsf{r}\mathbf{k}\mathbf{q} \mid s\vert\_{\mathfrak{k}} = x\} \text{ if } \mathfrak{p} = \mathsf{r} \mathbf{1} \mathbf{q} \text{ with } t\vert\_{\mathfrak{r}} = \left(\lambda x.s\right)u \end{array}$$

For instance, given <sup>t</sup> = (λx.(λy.x) <sup>x</sup>) <sup>z</sup>, then oc(t) = {, <sup>0</sup>, <sup>1</sup>, <sup>00</sup>, <sup>000</sup>, <sup>001</sup>, <sup>0000</sup>}, roc(t) = {, <sup>00</sup>}, <sup>00</sup>/<sup>00</sup> <sup>=</sup> <sup>∅</sup>, /<sup>00</sup> <sup>=</sup> {}, <sup>00</sup>/ <sup>=</sup> {} and <sup>1</sup>/ <sup>=</sup> {1, <sup>00</sup>}.

Notice that <sup>p</sup>/<sup>r</sup> <sup>⊆</sup> oc(<sup>t</sup> ) where <sup>r</sup> : <sup>t</sup> <sup>→</sup><sup>β</sup> <sup>t</sup> . Furthermore, if p is the occurrence of a redex in <sup>t</sup> (*i.e.* <sup>p</sup> <sup>∈</sup> roc(t)), then <sup>p</sup>/<sup>r</sup> <sup>⊆</sup> roc(<sup>t</sup> ), and each position in p/r is called a *residual* of p after reducing r. This notion is extended to sets of redex occurrences, indeed, the *residuals of* <sup>P</sup> *after* <sup>r</sup> *in* <sup>t</sup> are <sup>P</sup>/<sup>r</sup> *def* = - <sup>p</sup>∈P <sup>p</sup>/r. In particular ∅/r = ∅. Given ρ : t <sup>β</sup> t and P ⊆ roc(t), the *residuals of* <sup>P</sup> *after the sequence* ρ are: P/*nil def* <sup>=</sup> <sup>P</sup> and <sup>P</sup>/rρ *def* <sup>=</sup> (P/r)/ρ .

Stability of the to-the-left relation makes use of the notion of residual:

**Lemma 1.** *Given a term* <sup>t</sup>*, let* <sup>l</sup>,r,<sup>s</sup> <sup>∈</sup> roc(t) *such that* <sup>l</sup> *is to-the-left of* <sup>r</sup>*,* <sup>s</sup> <sup>l</sup> *and* <sup>s</sup> : <sup>t</sup> <sup>→</sup><sup>β</sup> <sup>t</sup> *. Then,* <sup>l</sup> <sup>∈</sup> roc(<sup>t</sup> ) *and* l *is to-the-left of* r *for every* r <sup>∈</sup> <sup>r</sup>/s*.*

*Proof.* By case analysis using the definition of *to-the-left* [15]. 

Notice that this result does not only implies that the leftmost redex is preserved by reduction of other redexes, but also that the residual of the leftmost redex occurs in exactly the same occurrence as the original one.

**Corollary 1.** *Given a term* <sup>t</sup>*, and* <sup>l</sup> <sup>∈</sup> roc(t) *the leftmost redex of* <sup>t</sup>*, if the reduction* ρ : t <sup>β</sup> t *contracts neither* <sup>l</sup> *nor any of its residuals, then* <sup>l</sup> <sup>∈</sup> roc(<sup>t</sup> ) *is the leftmost redex of* t *.*

*Proof.* By induction on the length of ρ using Lemma 1. 

#### **3.2 Notions of Normal Form**

The expected result of evaluating a program is specified by means of some appropriate notion of normal form. Given any relation →R, a term t is said to be in R-*normal form* (NFR) iff there is no t such that t →<sup>R</sup> t . A term t is R-*normalising* (WN <sup>R</sup>) iff there exists u ∈ NF<sup>R</sup> such that t <sup>R</sup> u. Thus, given an R-normalising term t, we can define the set of R-normal forms of t as nfR(t) *def* = {t | t <sup>R</sup> t ∧ t ∈ NFR}.

In particular, it turns out that a term in *weak-head* β-*normal form* (WHNFβ) is of the form x t<sup>1</sup> ...t<sup>n</sup> (n ≥ 0) or λx.t, where t, t1,...,t<sup>n</sup> are arbitrary terms, *i.e.* it has no weak-head redex. The set of weak-head β-normal forms of t is whnfβ(t) *def* = {t | t <sup>β</sup> t ∧ t ∈ WHNFβ}.

Similarly, a term in *head* β-*normal form* (HNFβ) turns out to be of the form λx1.. . . λxn.x t<sup>1</sup> ...t<sup>m</sup> (n, m ≥ 0), *i.e.* it has no head redex. The set of head β-normal forms of t is given by hnfβ(t) *def* = {t | t <sup>β</sup> t ∧ t ∈ HNFβ}.

Last, any term in β-*normal form* (NFβ) has the form λx1.. . . λxn.x t<sup>1</sup> ...t<sup>m</sup> (n, m ≥ 0) where t1,...,t<sup>m</sup> are themselves in β-normal form. It is well-known that the set nfβ(t) is a singleton, so we may use it either as a set or as its unique element.

It is worth noticing that NF<sup>β</sup> ⊂ HNF<sup>β</sup> ⊂ WHNFβ. Indeed, the inclusions are strict, for instance λx.(λy.y) z is in weak-head but not in head β-normal form, while x ((λy.y) x) z is in head but not in β-normal form.

#### **3.3 Notions of Needed Reduction**

The different notions of normal form considered in Sect. 3.2 suggest different notions of needed reduction, besides the standard one in the literature [8]. Indeed, consider <sup>r</sup> <sup>∈</sup> roc(t). We say that <sup>r</sup> is *used* in a reduction sequence <sup>ρ</sup> iff <sup>ρ</sup> reduces r or some residual of r. Then:


Notice in particular that nfβ(t) = ∅ (resp. hnfβ(t) = ∅ or whnfβ(t) = ∅) implies every redex in t is needed (resp. head needed or weak-head needed).

A *one-step reduction*→<sup>β</sup> is *needed* (resp. *head* or *weak-head needed*), noted →nd (resp. →hnd or →whnd), if the contracted redex is needed (resp. head or weak-head needed). A *reduction sequence*<sup>β</sup> is *needed* (resp. *head* or *weak-head needed*), noted nd (resp. hnd or whnd), if every reduction step in the sequence is needed (resp. head or weak-head needed).

For instance, consider the reduction sequence:

$$(\lambda y.\lambda x.I\ x\left(\underline{I}\underline{I}\underline{I}\_{\mathtt{t}}\right))(II) \to\_{\mathsf{nd}} (\lambda y.\lambda x.\underline{I}\underline{x}\_{\mathtt{t}}.I)(II) \to\_{\mathsf{nd}} (\underline{\lambda y.\lambda x.x}.I)(II)\_{\mathtt{t}} \to\_{\mathsf{nd}} \lambda x.x\,I$$

which is needed but not head needed, since redex r<sup>1</sup> might not be contracted to reach a head normal form:

$$(\lambda y.\lambda x.\underline{I}.\underline{x}\_{\mathsf{r}\_2}(I\,I))\,(II)\rightarrow\_{\mathsf{hnd}}\underline{\left(\lambda y.\lambda x.x\left(I\,I\right)\right)\left(II\right)}\_{\mathsf{r}\_3}\rightarrow\_{\mathsf{hnd}}\lambda x.x\left(I\,I\right)$$

Moreover, this second reduction sequence is head needed but not weak-head needed since only redex r<sup>3</sup> is needed to get a weak-head normal form:

$$\left(\underline{\lambda y.\lambda x.I \, x \, (II)}\right) \underline{\mathop{\cdot\, \mathsf{Ind}}}\_{\mathsf{r}\_3} \lambda x.I \, x \, (II).$$

Notice that the following equalities hold: NFnd = NFβ, NFhnd = HNF<sup>β</sup> and NFwhnd = WHNFβ.

Leftmost redexes and reduction sequences are indeed needed:

**Lemma 2.** *The leftmost redex in any term not in normal form (resp. head or weak-head normal form) is needed (resp. head or weak-head needed).*

*Proof.* By contradiction using the definition of *needed* [15]. 

**Theorem 1.** *Let* <sup>r</sup> <sup>∈</sup> roc(t) *and* <sup>ρ</sup> : <sup>t</sup> <sup>β</sup> <sup>t</sup> *be the leftmost reduction (resp. head reduction or weak-head reduction) starting with* t *such that* t = nfβ(t) *(resp.* t <sup>∈</sup> hnfβ(t) *or* <sup>t</sup> <sup>∈</sup> whnfβ(t)*). Then,* <sup>r</sup> *is needed (resp. head or weak-head needed) in* t *iff* r *is used in* ρ*.*

*Proof.* By definition of *needed* using Lemma 2 [15]. 

Notice that the weak-head reduction is a prefix of the head reduction, which is in turn a prefix of the leftmost reduction to normal form. As a consequence, it is immediate to see that every weak-head needed redex is in particular head needed, and every head needed redex is needed as well. For example, consider:

$$\frac{\left(\lambda y.\lambda x.\overline{Ix}^{\mathsf{r}\_2}\left(\overline{II}^{\mathsf{r}\_3}\right)\right)\left(\overline{II}^{\mathsf{r}\_4}\right)\_{\mathsf{r}\_1}}{}$$

where r<sup>3</sup> is a needed redex but not head needed nor weak-head needed. However, r<sup>2</sup> is both needed and head needed, while r<sup>1</sup> is the only weak-head needed redex in the term, and r<sup>4</sup> is not needed at all.

# **4 The Type System** *<sup>V</sup>*

In this section we recall the (non-idempotent) intersection type system V [14] –an extension of those in [12,13]– used here to characterise normalising terms w.r.t. the weak-head strategy. More precisely, we show that t is typable in system V if and only if t is normalising when only weak-head needed redexes are contracted. This characterisation is used in Sect. 9 to conclude that the weak-head needed strategy is observationally equivalent to the call-by-need calculus (to be introduced in Sect. 8).

Given a constant type <sup>a</sup> that denotes *answers* and a countable infinite set <sup>B</sup> of base type variables α, β, γ, . . ., we define the following sets of types:

**(Types)** τ,σ ::= <sup>a</sup> <sup>|</sup> <sup>α</sup> ∈B|M→ <sup>τ</sup> **(Multiset types)** <sup>M</sup>, <sup>N</sup> ::= {{τi}}<sup>i</sup>∈<sup>I</sup> where <sup>I</sup> is a finite set

The empty multiset is denoted by {{}}. We remark that types are *strict* [18], *i.e.* the right-hand sides of functional types are never multisets. Thus, the general form of a type is M<sup>1</sup> → ... → M<sup>n</sup> → τ with τ being the constant type or a base type variable.

*Typing contexts* (or just *contexts*), written Γ,Δ, are functions from variables to multiset types, assigning the empty multiset to all but a finite set of

variables. The domain of Γ is given by dom(Γ) *def* = {x | Γ(x) = {{}}}. The *union of contexts*, written Γ + Δ, is defined by (Γ + Δ)(x) *def* = Γ(x) Δ(x), where denotes multiset union. An example is (x : {{σ}}, y : {{τ}})+(x : {{σ}}, z : {{τ}}) = (x : {{σ, σ}}, y : {{τ}}, z : {{τ}}). This notion is extended to several contexts as expected, so that +i∈IΓ<sup>i</sup> denotes a finite union of contexts (when <sup>I</sup> <sup>=</sup> <sup>∅</sup> the notation is to be understood as the empty context). We write Γ \\ x for the context (Γ \\ x)(x) = {{}} and (Γ \\ x)(y) = Γ(y) if y = x.

*Type judgements* have the form Γ t : τ , where Γ is a typing context, t is a term and τ is a type. The intersection type system V for the λ-calculus is given in Fig. 1.

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} I \vdash t : \tau \\ \end{array} \\ \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \\ \end{array} \end{array} \end{array} \left( \begin{array}{c} \begin{array}{c} I \vdash t : \tau \\ \end{array} \\ \end{array} \right) & \begin{array}{c} I \vdash t : \tau \\ \end{array} \left( \begin{array}{c} I \vdash t : \lambda x.t : \Gamma(x) \to \tau \\\\ \end{array} \right) \left( \begin{array}{c} \begin{array}{c} \\ \end{array} \begin{array}{c} \begin{array}{c} I \vdash t : \lambda x.t : \Gamma(x) \to \tau \\ \end{array} \\ \end{array} \right) & \begin{array}{c} \begin{array}{c} I \vdash t : \{\} \sigma\_{i} \, \| \, \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \\ \end{array} \end{array} \end{array} \left( \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \\ \end{array} \right) \end{array} \right) & \begin{array}{c} \begin{array}{c} I \vdash t : \sigma\_{i} \\ \end{array} \end{array} \end{array} \right) \end{array}$$

**Fig. 1.** The non-idempotent intersection type system V.

The constant type a in rule (val) is used to type values. The axiom (ax) is relevant (there is no weakening) and the rule (→e) is multiplicative. Note that the argument of an application is typed #(I) times by the premises of rule (→e). A particular case is when <sup>I</sup> <sup>=</sup> <sup>∅</sup>: the subterm <sup>u</sup> occurring in the typed term t u turns out to be untyped.

A *(type) derivation* is a tree obtained by applying the (inductive) typing rules of system V. The notation <sup>V</sup> Γ t : τ means there is a derivation of the judgement Γ t : τ in system V. The term t is typable in system V, or V-typable, iff t is the *subject* of some derivation, *i.e.* iff there are Γ and τ such that <sup>V</sup> Γ t : τ . We use the capital Greek letters Φ, Ψ, . . . to name type derivations, by writing for example Φ <sup>V</sup> Γ t : τ . For short, we usually denote with Φ<sup>t</sup> a derivation with subject t for some type and context. The *size of the derivation* Φ, denoted by sz(Φ), is defined as the number of nodes of the corresponding derivation tree. We write RULE(Φ) ∈ {(ax),(→i),(→e)} to access the last rule applied in the derivation Φ. Likewise, PREM(Φ) is the *multiset* of proper maximal subderivations of Φ. For instance, given

$$\Phi = \frac{\Phi\_t \quad (\Phi\_u^i)\_{i \in I}}{\varGamma \vdash t \, u : \tau} \left( \rightarrow \mathfrak{e} \right),$$

we have RULE(Φ)=(→e) and PREM(Φ) = {{Φt}}
{{Φ<sup>i</sup> <sup>u</sup> | i ∈ I}}. We also use functions CTXT(Φ), SUBJ(Φ) and TYPE(Φ) to access the context, subject and type of the judgement in the root of the derivation tree respectively. For short, we also use notation Φ(x) to denote the type associated to the variable x in the typing environment of the conclusion of Φ (*i.e.* Φ(x) *def* = CTXT(Φ)(x)).

Intersection type systems can usually be seen as models [11], *i.e.* typing is stable by convertibility: if t is typable and t =<sup>β</sup> t , then t is typable too. This property splits in two different statements known as *subject reduction* and *subject expansion* respectively, the first one giving stability of typing by reduction, the second one by expansion. In the particular case of *non-idempotent types*, subject reduction refines to *weighted subject-reduction*, stating that not only typability is stable by reduction, but also that the size of type derivations is decreasing. Moreover, this decrease is strict when reduction is performed on special occurrences of redexes, called *typed occurrences*. We now introduce all these concepts.

Given a type derivation Φ, the set TOC(Φ) of *typed occurrences* of Φ, which is a subset of oc(SUBJ(Φ)), is defined by induction on the last rule of Φ.

{{Φ<sup>i</sup> <sup>u</sup> | i ∈ I}}, then TOC(Φ) *def* <sup>=</sup> {}∪{0p <sup>|</sup> <sup>p</sup> <sup>∈</sup> TOC(Φt)} ∪ ( - <sup>i</sup>∈<sup>I</sup> {1p <sup>|</sup> <sup>p</sup> <sup>∈</sup> TOC(Φ<sup>i</sup> <sup>u</sup>)}).

Remark that there are two kind of untyped occurrences, those inside untyped arguments of applications, and those inside untyped bodies of abstractions. For instance consider the following type derivations:

$$\Phi\_K = \frac{\overbrace{x : \{\mathbf{a}\} \vdash x : \mathbf{a}}^{\mathbf{(ax)}}(\mathbf{a}\mathbf{a})}{\hspaceeq{1cm} \vdash K : \{\mathbf{a}\} \vdash \lambda y.x : \{\mathbf{\emptyset}\} \rightarrow \mathbf{a}}^{\mathbf{(ax)}}(\rightarrow \mathbf{i})} (\rightarrow \mathbf{i}) \qquad \begin{array}{c} \Phi\_K = \overbrace{\begin{array}{c} \Phi\_K \end{array} \vdash I : \mathbf{a} \\ \Phi\_{KI\Omega} = \overbrace{\begin{array}{c} \vdash K \ I : \{\mathbf{\emptyset}\} \rightarrow \mathbf{a} \\ \vdash K \ II \bullet \{\mathbf{\emptyset}\} \rightarrow \mathbf{a} \end{array}}^{\mathbf{(\bullet \mathbf{a})}}(\rightarrow \mathbf{\emptyset}) \end{array}}{\begin{array}{c} \vdash K \ I : \{\mathbf{\emptyset}\} \rightarrow \mathbf{a} \\ \vdash \mathbf{a} \end{array}} (\rightarrow \mathbf{e})$$

Then, TOC(ΦKIΩ) = {, <sup>0</sup>, <sup>00</sup>, <sup>01</sup>, <sup>000</sup>, <sup>0000</sup>} ⊆ oc(KIΩ).

*Remark 1.* The weak-head redex of a typed term is always a typed occurrence.

Given <sup>Φ</sup> and <sup>p</sup> <sup>∈</sup> TOC(Φ), we define <sup>Φ</sup>|<sup>p</sup> as the *multiset* of *all the subderivations of* Φ *at occurrence*p (a formal definition can be found in [15]). Note that <sup>Φ</sup>|<sup>p</sup> is a multiset since the subterm of SUBJ(Φ) at position <sup>p</sup> may be typed several times in <sup>Φ</sup>, due to rule (→e).

We can now state the two main properties of system V, whose proofs can be found in Sect. 7 of [9].

**Theorem 2 (Weighted Subject Reduction).** *Let* <sup>Φ</sup><sup>V</sup> <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> *. If* <sup>r</sup> : <sup>t</sup> <sup>→</sup><sup>β</sup> t *, then there exists* Φ *s.t.* Φ <sup>V</sup> Γ t : τ *. Moreover,*

*1. If* <sup>r</sup> <sup>∈</sup> TOC(Φ)*, then* sz(Φ) <sup>&</sup>gt; sz(Φ )*. 2. If* <sup>r</sup> <sup>∈</sup>/ TOC(Φ)*, then* sz(Φ) = sz(Φ )*.*

**Theorem 3 (Subject Expansion).** *Let* Φ <sup>V</sup> Γ t : τ *. If* t →<sup>β</sup> t *, then there exists* Φ *s.t.* Φ <sup>V</sup> Γ t : τ *.*

Note that weighted subject reduction implies that reduction of typed redex occurrences turns out to be normalising.

### **5 Substitution and Reduction on Derivations**

In order to relate typed redex occurrences of convertible terms, we now extend the notion of β-reduction to derivation trees, by making use of a natural and basic concept of typed substitution. In contrast to substitution and β-reduction on *terms*, these operations are now both non-deterministic on derivation trees (see [19] for discussions and examples). Given a variable x and type derivations Φ<sup>t</sup> and (Φ<sup>i</sup> <sup>u</sup>)i∈<sup>I</sup> , the *typed substitution* of x by (Φ<sup>i</sup> <sup>u</sup>)i∈<sup>I</sup> in Φt, written Φt x (Φ<sup>i</sup> <sup>u</sup>)<sup>i</sup>∈<sup>I</sup> by making an abuse of notation, is a type derivation inductively defined on <sup>Φ</sup>t, only if <sup>Φ</sup>t(x) = {{TYPE(Φ<sup>i</sup> <sup>u</sup>)}}<sup>i</sup>∈<sup>I</sup> . This non-deterministic construction may be non-trivial but it can be naturally formalised in a quite straightforward way (full details can be found in [15]). Intuitively, the typed substitution replaces typed occurrences of x in Φ<sup>t</sup> by a corresponding derivation Φ<sup>i</sup> u matching the same type, where such a matching is chosen in a non-deterministic way. Moreover, it also substitutes all untyped occurrences of x by u, where this untyped operation is completely deterministic. Thus, for example, consider the following substitution, where ΦKI is defined in Sect. 4:

$$\left(\frac{x:\{\{\}\}\to\mathbf{a}\}\vdash x:\{\}\to\mathbf{a}}{x:\{\{\}\to\mathbf{a}\}\vdash x\, x:\mathbf{a}}\left(\stackrel{\scriptstyle\mathbf{a}}{\longrightarrow}\mathbf{e}\right)}\left(\stackrel{\scriptstyle\mathbf{a}}{\longrightarrow}\mathbf{e}\right)\left(\stackrel{\scriptstyle\mathbf{a}}{\longrightarrow}\mathbf{e}\right) = \frac{\Phi\_{KI}}{\vdash\left(KI\right)\left(KI\right):\mathbf{a}}\left(\stackrel{\scriptstyle\mathbf{e}}{\longrightarrow}\mathbf{e}\right)$$

The following lemma relates the typed occurrences of the trees composing a substitution and those of the substituted tree itself:

**Lemma 3.** *Let* Φ<sup>t</sup> *and* (Φ<sup>i</sup> <sup>u</sup>)<sup>i</sup>∈<sup>I</sup> *be derivations such that* Φ<sup>t</sup> x (Φ<sup>i</sup> <sup>u</sup>)<sup>i</sup>∈<sup>I</sup> *is defined, and* <sup>p</sup> <sup>∈</sup> oc(t)*. Then,*


*Proof.* By induction on Φt. 

Based on the previous notion of substitutions on derivations, we are now able to introduce (non-deterministic) reduction on derivation trees. The *reduction relation* →<sup>β</sup> on derivation trees is then defined by first considering the following basic rewriting rules.

1. For typed β-redexes:

$$\frac{\begin{array}{l} \Phi\_{t} \rhd \nu \; \varGamma ; x : \{\sigma\_{i}\}\_{i \in I} \vdash t : \tau \\ \hline \end{array}}{\begin{array}{l} \Gamma \vdash \lambda x.t : \{\{\sigma\_{i}\}\_{i \in I} \rightarrow \tau \\ \hline \end{array}} \quad (\Phi\_{u}^{i} \rhd \rhd \lambda \vdash u : \sigma\_{i})\_{i \in I} \begin{array}{l} \\ \hline \end{array} \mapsto\_{\beta} \Phi\_{t} \left\{ x \left/ (\Phi\_{u}^{i})\_{i \in I} \right\} \right\}$$

2. For β-redexes in untyped occurrences, with u →<sup>β</sup> u :

$$\frac{\Gamma \vdash t : \{\} \to \tau}{\Gamma \vdash t \, u : \tau} \; \begin{array}{c} \Gamma \vdash t : \{\} \to \tau \\\hline \Gamma \vdash t \, u' : \tau \end{array} \qquad \frac{\begin{array}{c} \vdash \tau \\\hline \vdash \lambda x.u : \mathsf{a} \end{array} \quad \begin{array}{c} \mapsto \xi \; \begin{array}{c} \hline \vdash \lambda x.u' : \mathsf{a} \end{array} \end{array}}{\vdash \lambda x.u' : \mathsf{a}}$$

As in the case of the λ-calculus, where reduction is closed under usual *term* contexts, we need to close the previous relation under *derivation tree* contexts. However, a one-step reduction on a given subterm causes many one-step reductions in the corresponding derivation tree (recall Φ|<sup>p</sup> is defined to be a multiset). Then, informally, given a redex occurrence r of t, a type derivation Φ of t, and the multiset of minimal subderivations of Φ containing r, written *M*, we apply the reduction rules →β,ν,ξ to all the elements of *M*, thus obtaining a multiset *M* , and we recompose the type derivation of the reduct of t (see [15] for a formal definition). This gives the reduction relation →<sup>β</sup> on trees. A reduction sequence on derivation trees contracting only redexes in typed positions is dubbed a *typed reduction sequence*.

Note that typed reductions are normalising by Theorem 2, yielding a special kind of derivation. Indeed, given a type derivation Φ <sup>V</sup> Γ t : τ , we say that <sup>Φ</sup> is *normal* iff TOC(Φ) <sup>∩</sup> roc(t) = <sup>∅</sup>. Reduction on trees induces reduction on terms: when ρ : Φ <sup>β</sup> Φ , then SUBJ(Φ) <sup>β</sup> SUBJ(Φ ). By abuse of notation we may denote both sequences with the same letter ρ.

### **6 Weak-Head Neededness and Typed Occurrences**

This section presents one of our main results. It establishes a connection between weak-head needed redexes and typed redex occurrences. More precisely, we first show in Sect. 6.1 that every weak-head needed redex occurrence turns out to be a typed occurrence, whatever its type derivation is. The converse does not however hold. But, we show in Sect. 6.2 that any typed occurrence in a special kind of typed derivation (that we call principal) corresponds to a weak-head needed redex occurrence. We start with a technical lemma.

**Lemma 4.** *Let* <sup>r</sup> : <sup>Φ</sup><sup>t</sup> <sup>→</sup><sup>β</sup> <sup>Φ</sup><sup>t</sup> *and* <sup>p</sup> <sup>∈</sup> oc(t) *such that* <sup>p</sup> <sup>=</sup> <sup>r</sup> *and* <sup>p</sup> = r0*. Then,* <sup>p</sup> <sup>∈</sup> TOC(Φt) *iff there exists* <sup>p</sup> <sup>∈</sup> <sup>p</sup>/<sup>r</sup> *such that* <sup>p</sup> <sup>∈</sup> TOC(Φ<sup>t</sup>-)*.*

*Proof.* By induction on <sup>r</sup> using Lemma 3. 

#### **6.1 Weak-Head Needed Redexes Are Typed**

In order to show that every weak-head needed redex occurrence corresponds to a typed occurrence in some type derivation we start by proving that typed occurrences do not come from untyped ones.

**Lemma 5.** *Let* ρ : Φ<sup>t</sup> <sup>β</sup> Φ<sup>t</sup> *and* <sup>p</sup> <sup>∈</sup> oc(t)*. If there exists* <sup>p</sup> <sup>∈</sup> <sup>p</sup>/ρ *such that* <sup>p</sup> <sup>∈</sup> TOC(Φ<sup>t</sup>-)*, then* <sup>p</sup> <sup>∈</sup> TOC(Φt)*.*

*Proof.* Straightforward induction on ρ using Lemma 4. 

**Theorem 4.** *Let* r *be a weak-head needed redex in* t*. Let* Φ *be a type derivation of* <sup>t</sup>*. Then,* <sup>r</sup> <sup>∈</sup> TOC(Φ)*.*

*Proof.* By Theorem 1, r is used in the weak-head reduction from t to t ∈ WHNFβ. By Remark 1, the weak-head reduction contracts only typed redexes. Thus, r or some of its residuals is a typed occurrence in its corresponding derivation tree. Finally, we conclude by Lemma 5, <sup>r</sup> <sup>∈</sup> TOC(Φ). 

#### **6.2 Principally Typed Redexes Are Weak-Head Needed**

As mentioned before, the converse of Theorem 4 does not hold: there are some typed occurrences that do not correspond to any weak-head needed redex occurrence. This can be illustrated in the following examples (recall ΦKIΩ defined in Sect. 4):

ΦKIΩ (→i) λy.KIΩ : {{}} → a (ax) y : {{{{a}} → a}} y : {{a}} → a ΦKIΩ (→e) y : {{{{a}} → a}} y (KIΩ) : a

Indeed, the occurrence 0 (resp 1) in the term λy.KIΩ (resp. y (KIΩ)) is typed but not weak-head needed, since both terms are already in weak-head normal form. Fortunately, typing relates to redex occurrences if we restrict type derivations to *principal* ones: given a term t in weak-head β-normal form, the derivation Φ <sup>V</sup> Γ t : τ is *normal principally typed* if:

– t = x t<sup>1</sup> ...t<sup>n</sup> (n ≥ 0), and Γ = {x : {{ n times {{}} → ... → {{}} → τ}}} and τ is a type variable α (*i.e.* none of the t<sup>i</sup> are typed), or – t = λx.t , and Γ = ∅ and τ = a.

Given a weak-head normalising term t such that Φ<sup>t</sup> <sup>V</sup> Γ t : τ , we say that Φ<sup>t</sup> is *principally typed* if Φ<sup>t</sup> <sup>β</sup> Φ<sup>t</sup> for some t <sup>∈</sup> whnfβ(t) implies <sup>Φ</sup><sup>t</sup> is normal principally typed.

Note in particular that the previous definition does not depend on the chosen weak-head normal form t : suppose t <sup>∈</sup> whnfβ(t) is another weak-head normal form of t, then t and t are convertible terms by the Church-Rosser property [7] so that t can be normal principally typed iff t can, by Theorems 2 and 3.

**Lemma 6.** *Let* <sup>Φ</sup><sup>t</sup> *be a type derivation with subject* <sup>t</sup> *and* <sup>r</sup> <sup>∈</sup> roc(t) <sup>∩</sup> TOC(Φt)*. Let* ρ : Φ<sup>t</sup> <sup>β</sup> Φ<sup>t</sup> *such that* Φ<sup>t</sup>*is normal. Then,* r *is used in* ρ*.*

*Proof.* Straightforward induction on ρ using Lemma 4. 

The notions of leftmost and weak-head needed reductions on (untyped) terms naturally extends to *typed* reductions on tree derivations. We thus have:

**Lemma 7.** *Let* t *be a weak-head normalising term and* Φ<sup>t</sup> *be principally typed. Then, a leftmost typed reduction sequence starting at* Φ<sup>t</sup> *is weak-head needed.*

*Proof.* By induction on the leftmost typed sequence (called ρ). If ρ is empty the result is immediate. If not, we show that t has a typed weak-head needed redex (which is leftmost by definition) and conclude by inductive hypothesis. Indeed, assume t ∈ WHNFβ. By definition Φ<sup>t</sup> is normal principally typed and thus it has no typed redexes. This contradicts ρ being non-empty. Hence, t has a weakhead redex <sup>r</sup> (*i.e.* t /∈ WHNFβ). Moreover, <sup>r</sup> is both typed (by Remark 1) and weak-head needed (by Lemma 2). Thus, we conclude. 

**Theorem 5.** *Let* t *be a weak-head normalising term,* Φ<sup>t</sup> *be principally typed and* <sup>r</sup> <sup>∈</sup> roc(t) <sup>∩</sup> TOC(Φt)*. Then,* <sup>r</sup> *is a weak-head needed redex in* <sup>t</sup>*.*

*Proof.* Let ρ : Φ<sup>t</sup> <sup>β</sup> Φ<sup>t</sup> be the leftmost typed reduction sequence where Φ<sup>t</sup> is normal. Note that Φ<sup>t</sup> exists by definition of *principally typed*. By Lemma 7, ρ is a weak-head needed reduction sequence. Moreover, by Lemma 6, r is used in ρ. Hence, <sup>r</sup> is a weak-head needed redex in <sup>t</sup>. 

As a direct consequence of Theorems 4 and 5, given a weak-head normalising term t, the typed redex occurrences in its principally typed derivation (which always exists) correspond to its weak-head needed redexes. Hence, system V allows to identify all the weak-head needed redexes of a weak-head normalising term.

### **7 Characterising Weak-Head Needed Normalisation**

This section presents one of the main pieces contributing to our observational equivalence result. Indeed, we relate typing with weak-head neededness by showing that any typable term in system V is normalising for weak-head needed reduction. This characterisation highlights the power of intersection types. We start by a technical lemma.

**Lemma 8.** *Let* <sup>Φ</sup> <sup>V</sup> <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> *. Then,* <sup>Φ</sup> *normal implies* <sup>t</sup> ∈ WHNFβ*.*

*Proof.* By induction on Φ analysing the last rule applied. 

Let ρ : t<sup>1</sup> <sup>β</sup> tn. We say that ρ is a *left-to-right* reduction sequence iff for every i<n if <sup>r</sup><sup>i</sup> : <sup>t</sup><sup>i</sup> <sup>→</sup><sup>β</sup> <sup>t</sup>i+1 and <sup>l</sup><sup>i</sup> is to the left of <sup>r</sup><sup>i</sup> then, for every j>i such that <sup>r</sup><sup>j</sup> : <sup>t</sup><sup>j</sup> <sup>→</sup><sup>β</sup> <sup>t</sup>j+1 we have that <sup>r</sup><sup>j</sup> ∈ { / <sup>l</sup>i}/ρij where <sup>ρ</sup>ij : <sup>t</sup><sup>i</sup> <sup>β</sup> <sup>t</sup><sup>j</sup> is the corresponding subsequence of ρ. In other words, for every j and every i<j, r<sup>j</sup> is not a residual of a redex to the left of r<sup>i</sup> (relative to the given reduction subsequence from t<sup>i</sup> to t<sup>j</sup> ) [7].

Left-to-right reductions define in particular standard strategies, which give canonical ways to construct reduction sequences from one term to another:

**Theorem 6 (**[7]**).** *If* t <sup>β</sup> t *, there exists a left-to-right reduction from* t *to* t *.*

**Theorem 7.** *Let* <sup>t</sup> ∈ T<sup>a</sup>*. Then,* <sup>Φ</sup> <sup>V</sup> <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> *iff* <sup>t</sup> ∈ WN whnd*.*

*Proof.* ⇒) By Theorem 2 we know that the strategy reducing only typed redex occurrences is normalising, *i.e.* there exist t and Φ such that t <sup>β</sup> t , Φ <sup>V</sup> Γ t : τ and Φ normal. Then, by Lemma 8, t ∈ WHNFβ. By Theorem 6, there exists a left-to-right reduction ρ : t <sup>β</sup> t . Let us write

$$
\rho: t = t\_1 \twoheadrightarrow\_\beta t\_n \twoheadrightarrow\_\beta t'
$$

such that t1,...,tn−<sup>1</sup> ∈ WHNF / <sup>β</sup> and t<sup>n</sup> ∈ WHNFβ.

We claim that all reduction steps in t<sup>1</sup> <sup>β</sup> t<sup>n</sup> are leftmost. Assume towards a contradiction that there exists k<n such that <sup>r</sup> : <sup>t</sup><sup>k</sup> <sup>→</sup><sup>β</sup> <sup>t</sup>k+1 and <sup>r</sup> is not the leftmost redex of t<sup>k</sup> (written lk). Since ρ is a left-to-right reduction, no residual of l<sup>k</sup> is contracted after the k-th step. Thus, there is a reduction sequence from <sup>t</sup><sup>k</sup> ∈ WHNF / <sup>β</sup> to <sup>t</sup><sup>n</sup> ∈ WHNF<sup>β</sup> such that <sup>l</sup><sup>k</sup> is not used in it. This leads to a contradiction with l<sup>k</sup> being weak-head needed in t<sup>k</sup> by Lemma 2.

As a consequence, there is a leftmost reduction sequence t <sup>β</sup> tn. Moreover, by Lem. 2, t whnd t<sup>n</sup> ∈ WHNF<sup>β</sup> = NFwhnd. Thus, t ∈ WN whnd.

⇐) Consider the reduction ρ : t whnd t with t <sup>∈</sup> whnfβ(t). Let <sup>Φ</sup> <sup>V</sup> Γ t : τ be the normal principally typed derivation for t as defined in Sect. 6.2. Finally, we conclude by induction in ρ using Theorem 3, Φ <sup>V</sup> Γ t : τ . 

### **8 The Call-by-Need Lambda-Calculus**

This section describes the syntax and the operational semantics of the call-byneed lambda-calculus introduced in [1]. It is more concise than previous specifications of call-by-need [2,3,10,16], but it is operationally equivalent to them [6], so that our results could also be presented by using alternative specifications.

Given a countable infinite set X of variables x, y, z, . . . we define different syntactic categories for terms, values, list contexts, answers and need contexts:

$$\begin{array}{c} \text{(Terms) } t, u ::= x \in \mathcal{X} \mid t \, u \mid \lambda x.t \mid t[x \, u] \\ \text{(Values) } \quad v ::= \lambda x.t \\ \text{(List contexts) } \quad \mathbf{L} ::= \square \mid \mathbf{L}[x \, t] \\ \text{(Answer) } \quad a ::= \mathbf{L}\langle \lambda y.t \rangle \\ \text{(Need contexts) } \mathbf{M}, \mathbf{N} ::= \square \mid \mathbf{N}t \mid \mathbf{N}[x \, t] \mid \mathbf{N}\{x\}[x \, \mathbf{M}] \end{array}$$

We denote the set of terms by T<sup>e</sup>. Terms of the form t[x\u] are *closures*, and [x\u] is called an *explicit substitution* (ES). The set of T<sup>e</sup>-terms without ES is the set of *terms of the* λ*-calculus*, *i.e.* T<sup>a</sup>. The notions of *free* and *bound* variables are defined as expected, in particular, fv(t[x\u]) *def* <sup>=</sup> fv(t) \ {x} ∪fv(u), fv(λx.t) *def* <sup>=</sup> fv(t)\{x}, bv(t[x\u]) *def* <sup>=</sup> bv(t)∪{x}∪bv(u) and bv(λx.t) *def* <sup>=</sup> bv(t)∪{x}. We extend the standard notion of α-*conversion* to ES, as expected.

We use the special notation <sup>N</sup>u or <sup>L</sup>u when the free variables of <sup>u</sup> are not captured by the context, *i.e.* there are no abstractions or explicit substitutions in the context that binds the free variables of u. Thus for example, given N = (<sup>x</sup>)[x\z], we have (y x)[x\z] = <sup>N</sup>y <sup>=</sup> <sup>N</sup>y, but (x x)[x\z] = <sup>N</sup>x cannot be written as <sup>N</sup>x. Notice the use of this special notation in the last case of needed contexts, an example of such case being (x y)[y\t][x\-].

The *call-by-need calculus*, introduced in [1], is given by the set of terms T<sup>e</sup> and the *reduction relation* →need, the *union* of →dB and →lsv, which are, respectively, the closure by *need contexts* of the following rewriting rules:

$$\begin{array}{c} \text{L}\langle\lambda x.t\rangle\, u \longmapsto\_{\text{dB}} \text{L}\langle t[x\backslash u] \rangle\\ \text{N}\langle x\rangle[x\backslash \text{L}\langle v\rangle] \longmapsto\_{\text{1\\_v}} \text{L}\langle \text{N}\langle v\rangle[x\backslash v] \rangle \end{array}$$

These rules avoid capture of free variables. An example of need-reduction sequence is the following, where the redex of each step is underlined for clearness:

(λx1.I (x<sup>1</sup> I)) (λy.I y) →dB (I (x<sup>1</sup> I))[x1\λy.I y] →dB x2[x2\x<sup>1</sup> I][x1\λy.I y] →lsv x2[x2\(λx3.I x3) I][x1\λy.I y] →dB x2[x2\(I x3)[x3\I]][x1\λy.I y] →dB x2[x2\x4[x4\x3][x3\I]][x1\λy.I y] →lsv x2[x2\x4[x4\I][x3\I]][x1\λy.I y] →lsv x2[x2\I[x4\I][x3\I]][x1\λy.I y] →lsv I[x2\I][x4\I][x3\I][x1\λy.I y]

As for call-by-name, reduction preserves free variables, *i.e.* t →need t implies fv(t) <sup>⊇</sup> fv(<sup>t</sup> ). Notice that call-by-need reduction is also weak, so that answers are not need-reducible.

### **9 Observational Equivalence**

The results in Sect. 7 are used here to prove soundness and completeness of callby-need w.r.t weak-head neededness, our second main result. More precisely, a call-by-need interpreter stops in a value if and only if the weak-head needed reduction stops in a value. This means that call-by-need and call-by-name are observationally equivalent.

Formally, given a reduction relation R on a term language T , and an associated notion of context for T , we define t to be *observationally equivalent* to <sup>u</sup>, written <sup>t</sup> <sup>∼</sup>=<sup>R</sup> <sup>u</sup>, iff <sup>C</sup>t ∈ WN <sup>R</sup> <sup>⇔</sup> <sup>C</sup>u ∈ WN <sup>R</sup> for every context <sup>C</sup>. In order to show our final result we resort to the following theorem:

#### **Theorem 8 (**[14]**).**


These observations allows us to conclude:

**Theorem 9.** *For all terms* <sup>t</sup> *and* <sup>u</sup> *in* <sup>T</sup><sup>a</sup>*,* <sup>t</sup> <sup>∼</sup>=whnd <sup>u</sup> *iff* <sup>t</sup> <sup>∼</sup>=need <sup>u</sup>*.*

*Proof.* By Theorem 8:2 it is sufficient to show t ∼=whnd u iff t ∼=name u. The proof proceeds as follows:


 

### **10 Conclusion**

We establish a clear connection between the semantical standard notion of neededness and the syntactical concept of call-by-need. The use of non-idempotent types –a powerful technique being able to characterise different operational properties– provides a simple and natural tool to show observational equivalence between these two notions. We refer the reader to [5] for other proof techniques (not based on intersection types) used to connect semantical notions of neededness with syntactical notions of lazy evaluation.

An interesting (and not difficult) extension of our result in Sect. 6 is that call-by-need reduction (defined on λ-terms with explicit substitutions) contracts only dB weak-head needed redexes, for an appropriate (and very natural) notion of weak-head needed redex for λ-terms with explicit substitutions. A technical tool to obtain such a result would be the type system A [14], a straightforward adaptation of system V to call-by-need syntax.

Given the recent formulation of *strong call-by-need* [6] describing a deterministic call-by-need strategy to normal form (instead of weak-head normal form), it would be natural to extend our technique to obtain an observational equivalence result between the standard notion of needed reduction (to full normal forms) and the strong call-by-need strategy. This remains as future work.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Fitch-Style Modal Lambda Calculi**

Ranald Clouston(B)

Department of Computer Science, Aarhus University, Aarhus, Denmark ranald.clouston@cs.au.dk

**Abstract.** Fitch-style modal deduction, in which modalities are eliminated by opening a subordinate proof, and introduced by shutting one, were investigated in the 1990s as a basis for lambda calculi. We show that such calculi have good computational properties for a variety of intuitionistic modal logics. Semantics are given in cartesian closed categories equipped with an adjunction of endofunctors, with the necessity modality interpreted by the right adjoint. Where this functor is an idempotent comonad, a coherence result on the semantics allows us to present a calculus for intuitionistic S4 that is simpler than others in the literature. We show the calculi can be extended `a la tense logic with the left adjoint of necessity, and are then complete for the categorical semantics.

**Keywords:** Intuitionistic modal logic · Typed lambda calculi Categorical semantics

### **1 Introduction**

The Curry-Howard propositions-as-types isomorphism [21,39,41] provides a correspondence between natural deduction and typed lambda calculus of interest to both logicians and computer scientists. For the logician, term assignment offers a convenient notation to express and reason about syntactic properties such as proof normalisation, and, especially in the presence of dependent types, allows proofs of non-trivial mathematical theorems to be checked by computer programs. For the computer scientist, logics have been repurposed as typing disciplines to address problems in computing in sometimes surprising ways. Following Lambek [25], categories form a third leg of the isomorphism. Categorical semantics can be used to prove the consistency of a calculus, and they are crucial if we wish to prove or program in some particular mathematical setting. For example, see the use of the topos of trees as a setting for both programming with guarded recursion, and proof by L¨ob induction, by Clouston et al. [11].

This work involved two functors, 'later' and 'constant'. Where functors interact appropriately with finite products they correspond to necessity modalities in

We gratefully acknowledge discussions with Patrick Bahr, Lars Birkedal, Aleˇs Bizjak, Christian Uldal Graulund, G.A. Kavvos, Bassel Mannaa, Rasmus Ejlers Møgelberg, Andrew M. Pitts, and Bas Spitters, and the comments of the anonymous referees. This research was supported by a research grant (12386) from Villum Fonden.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 258–275, 2018. https://doi.org/10.1007/978-3-319-89366-2\_14

intuitionistic normal modal logic, usually written -. Such modalities have been extensively studied by logicians, and the corresponding type-formers are widely applicable in computing, for example to monads [32], staged programming [13], propositional truncation [2], and recent work in homotopy type theory [37]. There is hence a need to develop all sides of the Curry-Howard-Lambek isomorphism for necessity modalities. Approaches to modal lambda calculi are diverse; see the survey by Kavvos [23], and remarks in the final section of this paper. This paper focuses on *Fitch-style* modal lambda calculi as first proposed by Borghuis [9] and (as the "two-dimensional" approach) by Martini and Masini [29].

Fitch-style modal lambda calculi<sup>1</sup> adapt the proof methods of Fitch [19] in which given a formula -A we may open a '(strict) subordinate proof' in which we eliminate the to get premise A. Such a subordinate proof with conclusion B can then be shut by introducing a to conclude -B. Different modal logics can be encoded by tweaking the open and shut rules; for example we could shut the proof to conclude merely B, if we had the T axiom -<sup>B</sup> <sup>→</sup> <sup>B</sup>. Normal modal logics are usually understood with respect to Kripke's possible worlds semantics (for the intuitionistic version, see e.g. Simpson [38, Sect. 3.3]). In this setting Fitch's approach is highly intuitive, as opening a subordinate proof corresponds to travelling to a generic related world, while shutting corresponds to returning to the original world. See Fitting [20, Chap. 4] for a lengthier discussion of this approach to natural deduction.

Borghuis [9] kept track of subordinate proofs in a sequent presentation by introducing a new structural connective to the context when a is eliminated, and removing it from the context when one is introduced, in a style reminiscent of the treatment of modal logic in display calculus [42], or for that matter of the standard duality between implication and comma. To the category theorist, this suggests an operation on contexts *left adjoint* to -. This paper exploits this insight by presenting categorical semantics for Fitch-style modal calculi for the first time, answering the challenge of de Paiva and Ritter [33, Sect. 4], by modelling necessity modalities as right adjoints. This is logically sound and complete, yet less general than modelling modalities as monoidal functors as done for example by Bellin et al. [4]. For example, truncation in sets is monoidal but has no right adjoint. Nonetheless adjunctions are ubiquitous, and in their presence we argue that the case for Fitch-style calculi is compelling. Examples of right adoints of interest to type theorists include the aforementioned modalities of guarded recursion, the closure modalities of (differential) cohesive ∞-toposes [36, Sect. 3], and atom-abstraction in nominal sets [31].

In Sect. 2 we present Borghuis's calculus for the logic Intuitionistic K, the most basic intuitionistic modal logic of necessity. To the results of confluence, subject reduction, and strong normalisation already shown by Borghuis we add canonicity and the subformula property, with the latter proof raising a subtle issue with sums not previously observed. We give categorical semantics for this style of calculus for the first time and prove soundness. In Sect. 3 we introduce the

<sup>1</sup> 'Fitch-style' deduction can also be used to mean the linear presentation of natural deduction with subordinate proofs for implication.

left adjoint as a first-class type former `a la intuitionistic tense logic [17], in which the "everywhere in the future" modality is paired with "somewhere in the past". To our knowledge this is the first natural deduction calculus, let alone lambda calculus, for any notion of tense logic. It is not entirely satisfactory as it lacks the subformula property, but it does allow us to prove categorical completeness. In Sect. 4 we show how the basic techniques developed for Intuitionistic K extend to Intuitionistic S4, one of the most-studied intuitionistic modal logics. Instead of working with known Fitch-style calculi for this logic [13,34] we explore a new, particularly simple, calculus where the modality is *idempotent*, i.e. -A and --A are not merely logically equivalent, but isomorphic. Our semantics for this calculus rely on an unusual 'coherence' proof. In Sect. 5 we present a calculus corresponding to the logic Intuitionistic R. In Sect. 6 we conclude with a discussion of related and further work.

### **2 Intuitionistic K**

This section presents results for the calculus of Borghuis [9] for the most basic modal logic for necessity, first identified to our knowledge by Boˇzi´c et al. [10] as HK-; following Yokota [43] we use the name Intuitionistic K (IK). This logic extends intuitionistic logic with a new unary connective -, one new axiom

K: -(<sup>A</sup> <sup>→</sup> <sup>B</sup>) <sup>→</sup> -<sup>A</sup> <sup>→</sup> -B

and one new inference rule

*Necessitation:* if A is a theorem, then so is -A.

#### **2.1 Type System**

Contexts are defined by the grammar

$$F \triangleq \cdot \mid F, x:A \mid F, \blacksquare$$

where x is a variable not in Γ, A is a formula of intuitionistic modal logic, and is called a *lock*. The open lock symbol is used to suggest that a box has been opened, allowing access to its contents.

Ignoring variables and terms, sequents <sup>Γ</sup> <sup>A</sup> may be interpreted as intuitionistic modal formulae by the translation

$$\begin{array}{l} -\ \lbrack \cdot \vdash A \rbrack = A; \\ -\ \lbrack B, \Gamma \vdash A \rbrack = B \to \lbrack \Gamma \vdash A \rbrack; \\ -\ \lbrack \blacksquare \blacktriangleright, \Gamma \vdash A \rbrack = \Box \lbrack \Gamma \vdash A \rbrack. \end{array}$$

This interpretation will suffice to confirm the soundness and completeness of our calculus, considered as a natural deduction calculus, with respect to IK. It is however not a satisfactory basis for a categorical semantics, because it does not interpret the context as an object. In Sect. 2.3 we shall see that may instead by interpreted as a *left adjoint* of -, applied to the context to its left.

Figure 1 presents the typing rules. Rules for the product constructions 1, <sup>A</sup> <sup>×</sup> <sup>B</sup>, , t, u, <sup>π</sup><sup>1</sup> <sup>t</sup>, <sup>π</sup><sup>2</sup> <sup>t</sup> are as usual and so are omitted, while sums are discussed at the end of Sect. 2.2. Note that variables can only be introduced or abstracted if they do not appear to the left of a lock. In the variable rule the context Γ builds in variable exchange, while in the open rule Γ builds in variable weakening. Exchange of variables with locks, and weakening for locks, are not admissible.

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \Gamma, x:A, \Gamma' \vdash x:A \end{array} \blacksquare \end{array} \begin{array}{c} \blacksquare \end{array} \begin{array}{c} \begin{array}{c} \Gamma, x:A \vdash t:B \\ \hline \Gamma \vdash \lambda x.t:A \rightarrow B \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \Gamma \vdash t:A \rightarrow B \\ \hline \Gamma \vdash t \, u:B \end{array} \end{array} \begin{array}{c} \begin{array}{c} \Gamma \vdash t:A \rightarrow B \\ \hline \Gamma \vdash t \, u:B \end{array} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Gamma, \blacksquare \P \vdash t:A \\ \hline \Gamma \vdash \mathsf{b} \mathsf{ut} \, t \mathrel{\scalebox{1.0pt}{\langle\rangle}} \end{array} \begin{array}{c} \Gamma \vdash t:\Box A \\ \hline \Gamma \Box \mathsf{b} \, \Gamma \vdash \mathsf{o} \mathsf{en} \, t \mathrel{\rule{1.0pt}{\langle\rangle}} \end{array} \begin{array}{c} \blacksquare \P \notin \Gamma' \end{array} \end{array}$$

**Fig. 1.** Typing rules for Intuitionistic K

**Theorem 2.1 (Logical Soundness and Completeness).** *A formula is a theorem of* IK *if and only if it is an inhabited type in the empty context.*

We can for example show that the K axiom is inhabited:

$$\frac{f:\Box(A\to B), x:\Box A, \blacksquare\P \vdash \mathsf{open}\ f:A\to B \qquad f, x, \blacksquare\P \vdash \mathsf{open}\ x:A}{f:\Box(A\to B), x:\Box A, \blacksquare\P \vdash (\mathsf{open}\ f)(\mathsf{open}\ x):B}$$

#### **2.2 Computation**

We extend the usual notion of β-reduction on untyped terms with the rule

```
open shutt → t
```
We write for the reflexive transitive closure of →. This relation is plainly confluent. Two lemmas, proved by easy inductions on the derivation of the terms t, then allow us to prove subject reduction:

**Lemma 2.2 (Variable Weakening).** *If* Γ, Γ <sup>t</sup> : <sup>B</sup> *then* Γ, x : A, Γ <sup>t</sup> : <sup>B</sup>*.*

**Lemma 2.3 (Substitution).** *If* Γ, x : A, Γ <sup>t</sup> : <sup>B</sup> *and* <sup>Γ</sup> <sup>u</sup> : <sup>A</sup> *then* Γ, Γ <sup>t</sup>[u/x] : <sup>B</sup>*.*

**Theorem 2.4 (Subject Reduction).** *If* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> *and* <sup>t</sup> → <sup>u</sup> *then* <sup>Γ</sup> <sup>u</sup> : <sup>A</sup>*.*

*Proof.* <sup>β</sup>-reduction for <sup>→</sup> requires Lemma 2.3, and for requires Lemma 2.2.

A term t is *normalisable* if there exists an integer ν(t) bounding the length of any reduction sequence starting with t, and *normal* if ν(t) is 0. By standard techniques we prove the following theorems:

**Theorem 2.5 (Strong Normalisation).** *Given* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>*, the term* <sup>t</sup> *is normalisable.*

**Theorem 2.6 (Canonicity).** *If* Γ *is a context containing no variable assignments,* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>*, and* <sup>t</sup> *is normal, then the main term-former of* <sup>t</sup> *is the introduction for the main type-former of* A*.*

Concretely, if A is some base type then t is a value of that type.

**Theorem 2.7 (Subformula Property).** *Given* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> *with* <sup>t</sup> *normal, all subterms of* t *have as their type in the derivation tree a subtype of* A*, or a subtype of a type assigned in* Γ*.*

To attain this final theorem we need to take some care with sums. It is well known that lambda calculi with sums do not enjoy the subformula property unless they have additional reductions called commuting conversions [21, Chap. 10]. However the commuting conversions for the type

$$\begin{aligned} \mathsf{open\ case}s \mathsf{of}\ x.t; y.u &\mapsto \mathsf{case}\ s \mathsf{of}\ x.\mathsf{open}\ t; y.\mathsf{open}\ u. \\ \mathsf{open\ about} t &\mapsto \mathsf{abort}\ t \end{aligned}$$

do not obviously enjoy subject reduction because open might change the context. However if we tweak the definitions of the elimination term-formers for sums according to Fig. 2 then all results of this section indeed hold.

$$\frac{\begin{array}{c} \begin{array}{c} \Gamma \vdash s:A+B \end{array} \quad \Gamma, x:A, \Gamma' \vdash t:C\\ \hline \end{array} \begin{array}{c} \Gamma, y:B, \Gamma' \vdash u:C\\ \hline \end{array}}{\begin{array}{c} \Gamma, \Gamma' \vdash \mathtt{case} s \text{ of } x.t;y.u:C \end{array}} \begin{array}{c} \Gamma \vdash t:0\\ \hline \end{array} \begin{array}{c} \Gamma \vdash t:0\\ \hline \end{array} \begin{array}{c} \Gamma \vdash t:0\\ \hline \end{array}$$

Finally, while we will not explore computational aspects of η-equivalence in this paper, we do note that

$$\mathsf{shutopen}\ t = t$$

obeys subject reduction in both directions (provided, in the expansion case, that the type of t has as its main type-former).

#### **2.3 Categorical Semantics**

This section goes beyond Theorem 2.1 to establish the soundness of the type system with respect to a *categorical semantics*, in cartesian closed categories C equipped with an endofunctor that has a *left adjoint*, which we write .

We interpret types as C-objects via the structure of C in the obvious way. We then interpret contexts as C-objects by

$$\begin{array}{lcl} ^- & [\cdot] & \triangleq & 1; \\ ^- & [I, x:A] & \triangleq & [I] \times A; \\ ^- & [I, \blacksquare \P] & \triangleq & \spadesuit [I]. \end{array}$$

We omit the brackets -··· where no confusion is possible, and usually abuse notation by omitting the left-most '1×' where the left of the context is a variable.

We will also sometimes interpret contexts Γ as endofunctors, abusing notation to also write them as -Γ, or merely Γ, by taking -· as the identity, -Γ, x : A = -<sup>Γ</sup> <sup>×</sup> <sup>A</sup>, and -Γ, - = -Γ.

We interpret <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> as a <sup>C</sup>-arrow -<sup>Γ</sup> <sup>t</sup> : <sup>A</sup> : -<sup>Γ</sup> <sup>→</sup> <sup>A</sup>, often abbreviated to t, or merely t, by induction on the derivation of t as follows.

Standard constructions such as variables, abstraction and application are interpreted as usual. To interpret the rules for sums of Fig. 2 we use the fact that , as a left adjoint, preserves colimits.

shut: we simply apply the isomorphism <sup>C</sup>(-<sup>Γ</sup>, A) → C(-Γ, -A) given by the adjunction.

open: We apply the isomorphism <sup>C</sup>(-Γ, -<sup>A</sup>) → C(-Γ, A) to the arrow interpreting the premise, then compose with the projection -Γ, -, Γ → -Γ, -.

**Theorem 2.8 (Categorical Soundness).** *If* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> *and* <sup>t</sup> → <sup>t</sup> *then* t = t *.*

We also have that η-equivalent terms have the same denotation.

### **3 Left Adjoints and Categorical Completeness**

In this section we extend the calculus to include the left adjoint as a first-class type-former, and hence prove categorical completeness. The underlying logic is the fragment of intuitionistic tense logic [17] with just one pair of modalities, studied by Dzik et al. [15] as 'intuitionistic logic with a Galois connection'; we use the name IK. We have two new axioms

$$
\begin{array}{c}
\eta^m \colon \ A \to \Box \mathsf{O} \dot{A} \\
\hline \ldots \end{array}
$$

ε<sup>m</sup>: -<sup>A</sup> <sup>→</sup> <sup>A</sup>

We use the superscript m to identify these as the unit as the unit and counit of the *modal* adjunction -, to differentiate them from other (co)units used elsewhere in the paper. We have one new inference rule:

*Monotonicity:* if <sup>A</sup> <sup>→</sup> <sup>B</sup> is a theorem, then so is <sup>A</sup> <sup>→</sup> B.

#### **3.1 Type System and Computation**

We extend the type system of Fig. 1 with the new rules for presented in Fig. 3. , unlike -, need not commute with products, so does not interact well with contexts. Hence the subterms of a let dia term may not share variables.

$$\frac{\Gamma \vdash t : A}{\Gamma, \clubsuit, \clubsuit, \top \vdash \mathsf{dia}\, t : \spadesuit A} \clubsuit \notin \Gamma' \qquad\qquad \frac{\Gamma \vdash t : \spadesuit A \qquad x : A, \clubsuit \vdash u : B}{\Gamma \vdash \mathsf{let} \, \mathsf{dia}\, x \, \mathsf{be}\, t \, \mathsf{in}\, u : B}$$

We can construct the axioms of IK:

$$\frac{x:A,\mathsf{nd}\vdash\mathsf{dia}\,x:\mathsf{\bullet}\,A}{x:A\vdash\mathsf{shut\,\mathsf{dia}}\,x:\Box\mathsf{\bullet}\,A}\qquad\frac{x:\mathsf{\bullet}\,\Box A\vdash\,x:\mathsf{\bullet}\,\Box A}{x:\mathsf{\bullet}\,\Box A\vdash\mathsf{let\,\mathsf{dia}}\,y\,\mathsf{be}\,x\,\mathsf{in\,\mathsf{open}}\,y:A}$$

and given a closed term <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> we have the monotonicity construction

$$\begin{array}{c} x: \spadesuit A \vdash x: \spadesuit A \qquad y: A, \blacksquare \vdash \text{dia}(f\,y): \spadesuit B\\ \hline x: \spadesuit A \vdash \text{let } \text{dia}\,y \,\text{be}\,x \,\text{in}\,\text{dia}(f\,y): \spadesuit B \end{array}$$

To this we add the new β rule

$$\text{let đia } x \text{ beđia } t \text{ in } u \longmapsto u[t/x]$$

We can hence extend the syntactic results of the previous section to the logic IK, with the exception of the subformula property. Consider the term

$$\frac{x:\spadesuit A \vdash \textsf{let}\,\textsf{dia}\,y\,\mathsf{be}\,x\,\mathsf{in}\,\lambda z.\mathsf{dia}\,y:\spadesuit A \to \spadesuit A}{x:\spadesuit A \vdash (\mathtt{let}\,\textsf{dia}\,y\,\mathsf{be}\,x\,\mathsf{in}\,\lambda z.\mathsf{dia}\,y)x:\spadesuit A}$$

This term is normal but evidently fails the subformula property. One might expect, as with sums, that a commuting conversion would save the day by reducing the term to let dia y be x in ((λz.dia y)x), but this term sees the free variable x appear in the second subterm of a let dia expression, which is not permitted.

We now turn to η-equivalence, and an equivalence which we call *associativity*:

let dia x be tin dia x = t let dia x be s in (t[u/y]) = t[let dia x be s in u/y]if t's context contains y only

For example, under associativity the counter-example to the subformula property equals (λz.let dia y be x in dia y)x, which reduces to let dia y be x in dia y, which is η-equal to x. The equivalences enjoy subject reduction in both directions (requiring, as usual, that t has the right type for η-expansion).

#### **3.2 Categorical Semantics**

We interpret the new term-formers in the same categories as used in Sect. 2.3. For dia, given <sup>t</sup> : <sup>Γ</sup> <sup>→</sup> <sup>A</sup> we compose <sup>t</sup> with the projection Γ, -, Γ <sup>→</sup> Γ, -. The denotation of let dia <sup>x</sup> be <sup>t</sup>in <sup>u</sup> is simply <sup>u</sup>◦t. We may then confirm the soundness of β-reduction, η-equivalence, and associativity; we call these equivalences collectively *definitional equivalence*.

We extend standard techniques for proving completeness [25], constructing <sup>a</sup> *term model*, a category with types as objects and, as arrows <sup>A</sup> <sup>→</sup> <sup>B</sup>, terms of form <sup>x</sup> : <sup>A</sup> <sup>t</sup> : <sup>B</sup> modulo definitional equivalence. This is a category by taking identity as the term <sup>x</sup> and composition <sup>u</sup> ◦ <sup>t</sup> as <sup>u</sup>[t/x]. It is a cartesian closed category using the type- and term-formers for products and function spaces.

The modalities and act on types; they also act on terms by, for , the monotonicity construction, and for -, mapping <sup>x</sup> : <sup>A</sup> <sup>t</sup> : <sup>B</sup> to <sup>x</sup> : -<sup>A</sup> shutt[open x/x] : -B. One can check these constructions are functorial, and that the terms for η<sup>m</sup> and ε<sup>m</sup> are natural and obey the triangle equalities for the adjunction -.

Given a context <sup>Γ</sup> we define the *context term* <sup>Γ</sup> <sup>c</sup><sup>Γ</sup> : -Γ by

– <sup>c</sup>· ; – <sup>c</sup>Γ,x:<sup>A</sup> c<sup>Γ</sup> , x; – cΓ,-dia c<sup>Γ</sup> .

**Lemma 3.1.** *Given* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>*,* <sup>t</sup> *is definitionally equal to* -<sup>Γ</sup> <sup>t</sup> : <sup>A</sup>[c<sup>Γ</sup> /x]*.*

**Theorem 3.2 (Categorical Completeness).** *If* <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> *and* <sup>Γ</sup> <sup>u</sup> : <sup>A</sup> *are equal in all models then they are definitionally equal.*

*Proof.* t and u have equal denotations in the term model, so their denotations are definitionally equal. Definitional equality is preserved by substitution, so -<sup>Γ</sup> <sup>t</sup> : <sup>A</sup>[c<sup>Γ</sup> /x] = -<sup>Γ</sup> <sup>u</sup> : <sup>A</sup>[c<sup>Γ</sup> /x], so by Lemma 3.1, <sup>t</sup> <sup>=</sup> <sup>u</sup>.

### **4 Intuitionistic S4 for Idempotent Comonads**

Intuitionistic S4 (IS4) is the extension of IK with the axioms

T: -<sup>A</sup> <sup>→</sup> <sup>A</sup>

$$\begin{array}{c} \text{4:} \ \Box A \to \Box \Box A \end{array}$$

To the category theorist IS4 naturally suggests the notion of a *comonad*. IS4 is one of the most studied and widely applied intuitionistic modal logics; in particular there exist two Fitch-style calculi [13,34]. We conjecture that similar results to the previous sections could be developed for these calculi. Instead of pursuing such a result, we here show that a simpler calculus is possible if we restrict to *idempotent* comonads, where -A and --A are isomorphic. This restriction picks out an important class of examples – see for example the discussion of Rijke et al. [35] – and relies on a novel 'coherence' proof.

#### **4.1 Type System and Computation**

A calculus for IS4 is obtained by replacing the open rule of Fig. 1 by

$$\frac{\varGamma \vdash t : \Box A}{\varGamma, \varGamma' \vdash \mathsf{open} \, t : A}$$

The T and 4 axioms are obtained by


This confirms logical completeness; once can also easily check soundness.

Subject reduction for the <sup>β</sup>-reduction open shut<sup>t</sup> → <sup>t</sup> requires a new lemma, proved by an easy induction on t:

**Lemma 4.1 (Lock Replacement).** *If* Γ, -, Γ <sup>t</sup> : <sup>A</sup> *then* Γ, Γ , Γ <sup>t</sup> : <sup>A</sup>*.*

The key syntactic Theorems 2.5, 2.6, and 2.7 then follow easily.

η-expansion obeys subject reduction as before, but it is not the case, for example, that the term presented above for the 4 axiom reduces to shut x. We may however accept a notion of η-reduction on typed terms-in-context:

$$T \vdash \mathsf{shutopen} \\ t \longmapsto t : \Box A \text{ provided that } \\ F \vdash t : \Box A$$

This equivalence is more powerful than it might appear; it allows us to derive the idempotence of -, as the 4 axiom is mutually inverse with the instance --<sup>A</sup> <sup>→</sup> -A of the T axiom. That is, λx.open shut shut open x reduces to the identity on -A, and λx.shut shut open open x reduces to the identity on --A.

#### **4.2 Categorical Semantics**

We give semantics to our type theory in a cartesian closed category with an adjunction of endofunctors in which is a *comonad*. Equivalently [16, Sect. 3], is a monad, equipped with a unit η and multiplication μ. To confirm the *coherence* of these semantics, discussed in the next subsection, and the soundness of η-equivalence, we further require that is idempotent, or equivalently that all <sup>μ</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> are isomorphisms with inverses <sup>η</sup><sup>A</sup> <sup>=</sup> ηA.

To define the semantics we define *lock replacement* natural transformations l<sup>Γ</sup> : -<sup>Γ</sup> <sup>→</sup> , corresponding to Lemma 4.1, by induction on <sup>Γ</sup>:

– <sup>l</sup>· is the unit <sup>η</sup> of the monad;

– lΓ,x:<sup>A</sup> is the projection composed with l<sup>Γ</sup> ;

– lΓ,is l<sup>Γ</sup> composed with μ.

Note that lis the identity by the monad laws.

We may now define the interpretation of open: given <sup>t</sup> : <sup>Γ</sup> <sup>→</sup> -A we apply the adjunction to get an arrow <sup>Γ</sup> <sup>→</sup> <sup>A</sup>, then compose with <sup>l</sup>Γ- : Γ, Γ <sup>→</sup> Γ, -.

**Lemma 4.2.** *If we replace part of a context with a lock, then replace part of the new context that includes the new lock, we could have done this in one step:*

*Proof.* By induction on Γ4, with the base case following by induction on Γ3.

**Lemma 4.3.** -Γ, -, Γ <sup>t</sup> : <sup>A</sup> ◦ -Γ(l<sup>Γ</sup>- ) = -Γ, Γ , Γ <sup>t</sup> : <sup>A</sup>*.*

*Proof.* By induction on the derivation of t.

Now open shutt, where the open has weakening Γ , has denotation <sup>ε</sup><sup>m</sup> ◦t◦ η<sup>m</sup> ◦ <sup>l</sup><sup>Γ</sup>- , which is <sup>t</sup> ◦ <sup>l</sup><sup>Γ</sup> by the naturality of ε<sup>m</sup>, and the adjunction. This is what is required by Lemma 4.3, so β-reduction for is soundly modelled.

#### **4.3 Coherence**

Because the open rule involves a weakening, and does not explicitly record in the term what that weakening is, the same typed term-in-context can be the root of multiple derivation trees, for example:

$$\frac{x:\square\square A \vdash x:\square\square A}{x:\square\square A, \blacksquare\blacksquare \mathsf{open}\, x:\square A} \qquad\qquad\qquad\frac{x:\square\square A \vdash x:\square\square A}{x:\square\square A \vdash \mathsf{open}\, \mathsf{open}\, x:\square A}{x:\square\square A, \blacksquare\mathsf{open}\, x:\square A}$$

The categorical semantics of the previous section is defined by induction on derivations, and so does not truly give semantics to *terms* unless any two trees with the same root must have the same denotation. In this section we show that this property, here called *coherence*, indeed holds. We make crucial use of the idempotence of the comonad -.

We first observe that if Γ, Γ , Γ <sup>t</sup> : <sup>A</sup> and all variables of <sup>Γ</sup> are not free in <sup>t</sup>, then Γ, Γ <sup>t</sup> : <sup>A</sup>. The following lemma, proved by easy inductions, describes how the denotations of these derivations are related:

**Lemma 4.4.** *1. If* <sup>x</sup> *is not free in* <sup>t</sup> *then* Γ, x : A, Γ <sup>t</sup> : <sup>B</sup> *has the same denotation as* Γ, Γ <sup>t</sup> : <sup>B</sup> ◦ <sup>Γ</sup> (pr)*.*

*2.* Γ, Γ <sup>t</sup> : <sup>B</sup> *has denotation* Γ, -, Γ <sup>t</sup> : <sup>B</sup> ◦ <sup>Γ</sup> (η)*.*

The technical lemma below is the only place where idempotence is used.

**Lemma 4.5.** *Given* Γ, Γ <sup>t</sup> : <sup>A</sup> *with* <sup>Γ</sup> *not free in* <sup>t</sup>*, we have*

*where* t *on the bottom line is the original arrow with* Γ *strengthened away.*

*Proof.* By induction on Γ . The base case holds by the naturality of η.

We present only the lock case: <sup>η</sup> ◦ <sup>t</sup> <sup>=</sup> <sup>t</sup> ◦ <sup>η</sup> by the naturality of <sup>η</sup>. But by **idempotence**, η : Γ, Γ , - <sup>→</sup> Γ, Γ , -, equals η. Then by Lemma 4.4 t◦<sup>η</sup> is -Γ, Γ <sup>t</sup> : <sup>A</sup>, i.e. we have strengthened the lock away and can hence use our induction hypothesis, making the top trapezium commute in:

The left triangle commutes by definition, the bottom trapezium commutes by the naturality of μ, and the right triangle commutes by the monad laws.

**Lemma 4.6.** *Given* Γ, Γ <sup>t</sup> : <sup>A</sup> *with* <sup>Γ</sup> *not free in* <sup>t</sup>*, we have*

*where the bottom* t *is obtained via strengthening.*

*Proof.* By induction on Γ. The base case follows by Lemma 4.5.

**Lemma 4.7.** *Given* Γ, Γ <sup>t</sup> : - A *with the variables of* Γ *not free in* t*, the following arrows are equal:*


*Proof.* Immediate from Lemma 4.6, i.e.

**Theorem 4.8 (Coherence).** *Given two different derivation trees of a term, their denotation is equal.*

*Proof.* By induction on the number of nodes in the trees. The base case with one node is trivial. Suppose we have n+ 1 nodes. Then the induction hypothesis immediately completes the proof unless the nodes above the roots are non-equal. Then the final construction must be an instance of open, i.e. we have

$$\frac{\Gamma \vdash t : \Box A}{\Gamma, \Gamma', \Gamma'' \vdash \mathsf{open} \, t : A} \qquad \qquad \frac{\Gamma, \Gamma' \vdash t : \Box A}{\Gamma, \Gamma', \Gamma'' \vdash \mathsf{open} \, t : A}$$

Clearly any variables in Γ are not free in t, so we can use Lemma 4.4 on the top line of the right hand tree to derive <sup>Γ</sup> <sup>t</sup> : - A. By induction hypothesis this has the same denotation as the top line of the left hand tree. But Lemma 4.7 tells us that applying this strengthening and then opening with Γ , Γ is the same as opening with Γ only.

We can now demonstrate the soundness of <sup>η</sup>-equivalence: given <sup>Γ</sup> <sup>t</sup> : -A and <sup>Γ</sup> shut open <sup>t</sup> : -A by any derivations, we can by coherence safely assume that open used one lock only as its weakening, and so the arrows are equal by the adjunction.

#### **4.4 Left Adjoints and Categorical Completeness**

Following Sect. 3 we can add to the type theory; we need only modify the dia rule to

$$\frac{\varGamma \vdash t : A}{\varGamma, \varGamma' \vdash \text{dia } t : \spadesuit A}$$

to retain Lemma 4.1. The results of the previous sections, apart once more for the subformula property, still hold, where we define the denotation of Γ, Γ dia <sup>t</sup> as t composed with l<sup>Γ</sup>- . In particular, we must confirm that Lemma 3.1 extends to the new definitions of open and dia, for which we need the lemma below:

**Lemma 4.9.** *Given the term* x : -Γ, Γ <sup>l</sup><sup>Γ</sup>- : -Γ *defined in the term model,* lΓ- [cΓ,Γ-/x] *is definitionally equal to* dia c<sup>Γ</sup> *.*

Now open t[cΓ,Γ-/x] is let dia x be (let dia x be l<sup>Γ</sup>- [cΓ,Γ-/x] in diat)in open x, which by the lemma above is let dia x be (let dia x be dia c<sup>Γ</sup> in dia<sup>t</sup>)in open <sup>x</sup> → opent[c<sup>Γ</sup> /x], which equals open t by induction. The proof for dia is similar.

### **5 Intuitionistic R**

One can readily imagine how the calculus for IS4 could be modified for logics with only one of the T and 4 axioms. In this section we instead illustrate the flexibility of Fitch-style calculi by defining a calculus for the rather different logic Intuitionistic R (IR), which extends IK with the axiom

R: <sup>A</sup> <sup>→</sup> -A This axiom was first studied for intuitionistic necessity modalities by Curry [12], along with the axiom M, --<sup>A</sup> <sup>→</sup> -A, to develop a logic for monads. The importance of the logic with R but without M was established by McBride and Paterson [30] who showed that it captured the useful programming abstraction of *applicative functors*. We take the name R for the axiom from Fairtlough and Mendler [18], and for the logic from Litak [28].

We modify Figs. 1 and 3 simply by removing the side-conditions - <sup>∈</sup>/ <sup>Γ</sup> from the variable, open, and dia rules. We can then derive R:

$$\frac{x:A, \square \vdash x:A}{x:A \vdash \mathsf{shut}\, x:\square A}$$

For substitution and subject reduction we require the following lemma, easily proved by induction on the derivation of t:

#### **Lemma 5.1 (Lock Weakening).** *If* Γ, Γ <sup>t</sup> : <sup>A</sup> *then* Γ, -, Γ <sup>t</sup> : <sup>A</sup>*.*

We can also observe that η-equivalence preserves types in both directions.

We give semantics for this calculus in a cartesian closed category equipped with an adjunction of endofunctors and a 'point' natural transformation <sup>r</sup> : Id <sup>→</sup> preserved by -, i.e. r = r : -<sup>A</sup> <sup>→</sup> --A. This last property makes this model slightly less general than the notion of tensorial strength used for categorical semantics by McBride and Paterson [30], but is needed for coherence and the soundness of <sup>η</sup>-equivalence. We will use the arrow <sup>A</sup> <sup>→</sup> <sup>A</sup> defined by applying the adjunction to r; we call this q and note the property:

**Lemma 5.2.** <sup>q</sup> <sup>=</sup> <sup>q</sup> : <sup>A</sup> <sup>→</sup> A*.*

The *weakening* natural transformation <sup>w</sup><sup>Γ</sup> : <sup>Γ</sup> <sup>→</sup> Id is defined by induction on Γ via projection and q. Variables are then denoted by projection composed with weakening, and weakening is used similarly for open and dia. We can hence show the soundness of β-reduction for and . For the soundness of η-equivalence for we need the following lemma:

#### **Lemma 5.3.** w<sup>Γ</sup>-,- = w-,Γ- : Γ, -, Γ , - <sup>→</sup> Γ, -*.*

The denotation of Γ, -, Γ shut open <sup>t</sup> is <sup>ε</sup><sup>m</sup> ◦ <sup>t</sup> ◦ w<sup>Γ</sup>-,- ◦ <sup>η</sup><sup>m</sup>. By the above lemma we replace w<sup>Γ</sup>-, with w-,Γ- , so by the naturality of η<sup>m</sup> we have <sup>ε</sup><sup>m</sup> ◦ <sup>η</sup><sup>m</sup> ◦ <sup>t</sup> ◦ <sup>w</sup>-,Γ- , which is <sup>t</sup> ◦ <sup>w</sup>-,Γby the monad laws.

Moving to coherence, we conduct a similar induction to Theorem 4.8, considering the case

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash t : \Box A \\ \hline \Gamma, \blacksquare \emptyset, \Gamma', \blacksquare \emptyset, \Gamma'' \vdash \mathsf{open} \ t : A \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \Gamma, \blacksquare \emptyset, \Gamma' \vdash t : \Box A \\ \hline \Gamma, \blacksquare \emptyset, \Gamma', \blacksquare \emptyset, \Gamma'' \vdash \mathsf{open} \ t : A \end{array} \end{array}$$

The top line on the left weakens to the top line on the right, with denotation <sup>t</sup>◦w-,Γ- . By induction this equals the denotation of the top line of the right. Then the right hand term has denotation <sup>ε</sup><sup>m</sup> ◦ <sup>t</sup> ◦ w-,Γ- ◦ <sup>w</sup><sup>Γ</sup>-- . But by Lemma 5.3 w-,Γ- = w<sup>Γ</sup>-,-. It is clear that w<sup>Γ</sup>-,- ◦ <sup>w</sup><sup>Γ</sup>-- = w<sup>Γ</sup>-,-,Γ-- , which is exactly the weakening used on the left. Coherence for dia follows similarly.

Moving finally to categorical completeness, in the term model <sup>t</sup> ◦ <sup>r</sup> is shutt[open shut x/x], which reduces to shutt, so r is natural. r : -<sup>A</sup> <sup>→</sup> --A is shut shut open x, which is indeed η-equal to shut x.

We finally need to update Lemma 3.1 for our new definitions. We do this via a lemma similar to Lemma 4.9:

**Lemma 5.4.** *Given the term* x : -Γ, Γ <sup>w</sup><sup>Γ</sup>- : -Γ *defined in the term model,* w<sup>Γ</sup>- [cΓ,Γ-/x] *is definitionally equal to* c<sup>Γ</sup> *.*

Now the denotation of Γ, x : A, Γ <sup>x</sup> : <sup>A</sup> is <sup>π</sup>2w<sup>Γ</sup>- . Therefore we have π2w<sup>Γ</sup>- [cΓ,A,Γ-/x], which is <sup>π</sup>2cΓ,A by the lemma above. This is <sup>π</sup>2c<sup>Γ</sup> , x, which reduces to x.

The denotation of Γ, -, Γ open <sup>t</sup> : <sup>A</sup> is let dia <sup>x</sup> be <sup>w</sup><sup>Γ</sup> in opent. Applying the substitution [cΓ,-,Γ-/x] along with the lemma above yields the term let dia x be dia c<sup>Γ</sup> in open<sup>t</sup> → opent[c<sup>Γ</sup> /x], and induction completes. The calculations for dia follow similarly.

### **6 Related and Further Work**

*Conventional contexts.* Lambda calculi with conventional contexts containing typed variables only have been proposed for the logic of monads [32], for IS4 [5], for IK [4], and for a logic with 'L¨ob induction' [6], from which one can extract a calculus for IR. In previous work [11] we developed the *guarded lambda calculus* featuring two modalities, where one ('constant') was an (idempotent) comonad, and the other ('later') supported a notion of guarded recursion corresponding to L¨ob induction. We therefore used the existing work [5,6] 'off the shelf'.

Problems arose when we attempted to extend our calculus with dependent types [7]. Neither of the calculi with conventional contexts we had used scaled well to this extension. The calculus for IS4 [5], whose terms involved explicit substitutions, turned out to require these substitutions on types also, which added a level of complexity that made it difficult to write even quite basic dependently typed programs. The constant modality was therefore jettisoned in favour of an approach based on clock quantification [1], of which more below. The calculus for later employed a connective (from McBride and Patterson [30]) which acted on function spaces under the modality. However with dependent types we need to act not merely on function spaces, but on Π-types, and was unable to be used. Instead a novel notion of 'delayed substitution' was introduced. These were given an equational theory, but some of these equations could not be directed, so they did not give rise to a useful notion of computation.

*Modalities as quantifiers.* The suggestive but formally rather underdeveloped paper of De Queiroz and Gabbay [14] proposed that necessity modalities should be treated as universal quantifiers, inspired by the standard semantics of necessity as 'for all possible worlds'. This is one way to understand the relationship between the constant modality and clock quantification [1]. However clock quantification is more general than a single constant modality because we can identify multiple free clock variables with multiple 'dimensions' in which a type may or may not be constant. This gap in generality can probably be bridged by using multiple independent constant modalities. More problematically, while it is clear what the denotational semantics of the constant modality are, the best model for clock quantifiers yet found [8] is rather complicated and still leaves open some problems with coherence in the presence of a universe.

*Previous Fitch-style calculi.* The Fitch-style approach was pioneered, apparently independently, by Martini and Masini [29] and Borghuis [9]. Martini and Masini's work is rather notationally heavy, and weakening appears not to be admissible. Borghuis's calculus for IK is excellent, but his calculi for stronger logics are not so compelling, as each different axiom is expressed with another version of the open or shut rules, not all of which compute when combined. The calculus for IS4 of Pfenning and Wong [34], refined by Davies and Pfenning [13, Sect. 4], provide the basis of the IS4 calculus of this paper, but involve some complications which appear to correlate to not assuming idempotence. We have extended this previous work by investigating the subformula property, introducing categorical semantics, and showing how left adjoints to necessity modalities `a la tense logic can be used as types. Finally, the recent clocked type theory of Bahr et al. [3] independently gave a treatment of the later modality that on inspection is precisely Fitch-style (albeit with named 'locks'), and which has better computational properties than the delayed substitution approach.

*Dual contexts.* Davies and Pfenning [13] use a pair of contexts Δ; Γ with intended meaning -<sup>Δ</sup> <sup>∧</sup> <sup>Γ</sup>. This is quite different from the semantics of Fitchstyle sequents, where structure in the context denotes the *left adjoint* of -. In recent work Kavvos [24] has shown that dual contexts may capture a number of different modal logics, and the approach has been used as a foundation for both pen-and-paper mathematics [37] and, via an Agda fork [40], formalisation [26]. We support this work but there is reason to explore other options. First, writing programs with dual context calculi was described by Davies and Pfenning themselves as 'somewhat awkward', and in the same paper they suggest the Fitch-style approach as a less awkward alternative. Indeed, Fitch's approach was exactly designed to capture 'natural' modal deduction. Second, any application with multiple interacting modalities is unlikely to be accommodated in a mere two zones; the *mode theories* of Licata et al. [27] extend the dual zone approach to a richer setting in which interacting modalities, substructural contexts, and even Fitch-style natural deduction can be expressed<sup>2</sup>, but the increase in complexity is considerable and much work remains to be done.

*Further logics and algorithmic properties.* We wish to bring more logics into the Fitch-style framework, in particular the logic of the later modality, extending IR with the strong L¨ob axiom (-<sup>A</sup> <sup>→</sup> <sup>A</sup>) <sup>→</sup> <sup>A</sup>. The obvious treatment of this axiom does not terminate. but Bahr et al. [3] suggest that this can be managed by giving names to locks. We would further like to develop calculi with multiple modalities. This is easy to do by assigning each modality its own lock; two IK modalities give exactly the intuitionistic tense logic of Gor´e et al. [22]. The situation is rather more interesting where the modalities interact, as with the later and constant modalities. Finally, we would like to further investigate algorithmic properties of Fitch-style calculi such as type checking, type inference, and η-expansion and other notions of computation. In particular, we wonder if a notion of commuting conversion can be defined so that the calculi with enjoy the subformula property.

### **References**


<sup>2</sup> We are grateful to an anoymous reviewer for this last observation.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Realizability Interpretation and Normalization of Typed Call-by-Need** *λ***-calculus with Control**

Etienne Miquey ´ 1,2(B) and Hugo Herbelin<sup>2</sup>

<sup>1</sup> Equipe Gallinette, Inria, LS2N (CNRS), Universit´ ´ e de Nantes, Nantes, France etienne.miquey@inria.fr

<sup>2</sup> Equipe ´ πr<sup>2</sup>, Inria, IRIF (CNRS), Universit´e Paris-Diderot, Paris, France herbelin@inria.fr

**Abstract.** We define a variant of Krivine realizability where realizers are pairs of a term and a substitution. This variant allows us to prove the normalization of a simply-typed call-by-need λ-calculus with control due to Ariola *et al*. Indeed, in such call-by-need calculus, substitutions have to be delayed until knowing if an argument is really needed. We then extend the proof to a call-by-need λ-calculus equipped with a type system equivalent to classical second-order predicate logic, representing one step towards proving the normalization of the call-by-need classical second-order arithmetic introduced by the second author to provide a proof-as-program interpretation of the axiom of dependent choice.

### **1 Introduction**

#### **1.1 Realizability-Based Normalization**

Normalization by realizability is a standard technique to prove the normalization of typed λ-calculi. Originally introduced by Tait [36] to prove the normalization of System T, it was extended by Girard to prove the normalization of System F [11]. This kind of techniques, also called normalization by reducibility or normalization by logical relations, works by interpreting each type by a set of typed or untyped terms seen as realizers of the type, then showing that the way these sets of realizers are built preserve properties such as normalization. Over the years, multiple uses and generalization of this method have been done, for a more detailed account of which we refer the reader to the work of Gallier [9].

Realizability techniques were adapted to the normalization of various calculi for classical logic (see e.g. [3,32]). A specific framework tailored to the study of realizability for classical logic has been designed by Krivine [19] on top of a λcalculus with control whose reduction is defined in terms of an abstract machine. In such a machinery, terms are evaluated in front of stacks; and control (thus classical logic) is made available through the possibility of saving and restoring stacks. During the last twenty years, Krivine's classical realizability turned out to be fruitful both from the point of view of logic, leading to the construction of new models of set theory, and generalizing in particular the technique of Cohen's forcing [20–22]; and on its computational facet, providing alternative tools to the analysis of the computational content of classical programs<sup>1</sup>.

Noteworthily, Krivine realizability is one of the approaches contributing to advocating the motto that through the Curry-Howard correspondence, with new programming instructions come new reasoning principles<sup>2</sup>. Our original motivation for the present work is actually in line with this idea, in the sense that our long-term purpose is to give a realizability interpretation to dPAω, a call-by-need calculus defined by the second author [15]. In this calculus, the lazy evaluation is indeed a fundamental ingredient in order to obtain an executable proof term for the axiom of dependent choice.

#### **1.2 Contributions of the Paper**

In order to address the normalization of typed call-by-need λ-calculus, we design a variant of Krivine's classical realizability, where the realizers are closures (a term with a substitution for its free variables). The call-by-need λ-calculus with control that we consider is the λ[lvτ]-calculus. This calculus, that was defined by Ariola *et al.* [2], is syntactically described in an extension with explicit substitutions of the λμμ˜-calculus [6,14,29]. The syntax of the λμμ˜-calculus itself refines the syntax of the λ-calculus by syntactically distinguishing between *terms* and *evaluation contexts*. It also contains *commands* which combine terms and evaluation contexts so that they can interact together. Thinking of evaluation contexts as stacks and commands as states, the λμμ˜-calculus can also be seen as a syntax for abstract machines. As for a proof-as-program point of view, the λμμ˜-calculus and its variants can be seen as a term syntax for proofs of Gentzen's sequent calculus. In particular, the λμμ˜-calculus contains control operators which give a computational interpretation to classical logic.

We give a proof of normalization first for the simply-typed λ[lvτ]-calculus<sup>3</sup>, then for a type system with first-order and second-order quantification. While we only apply our technique to the normalization of the λ[lvτ]-calculus, our interpretation incidentally suggests a way to adapt Krivine realizability to other call-by-need settings. This paves the way to the computational interpretation of classical proofs using lazy evaluation or shared memory cells, including the case of the call-by-need second order arithmetic dPA<sup>ω</sup> [15].

<sup>1</sup> See for instance [27] about witness extraction or [12,13] about specification problems.

<sup>2</sup> For instance, one way to realize the axiom of dependent choice in classical realizability is by means of an extra instruction quote [18].

<sup>3</sup> Even though it has not been done formally, the normalization of the λlv-calculus presented in [2] should also be derivable from Polonowski's proof of strong normalization of the non-deterministic λμμ˜-calculus [35]. The λlv-calculus (a big-step variant of the λ[lvτ]-calculus introduced in Ariola *et al.*) is indeed a particular evaluation strategy for the λμμ˜-calculus, so that the strong normalization of the non-deterministic variant of the latter should imply the normalization of the former as a particular case.

### **2 The** *λ***[***lvτ* **]-calculus**

#### **2.1 The Call-by-Need Evaluation Strategy**

The call-by-need evaluation strategy of the λ-calculus evaluates arguments of functions only when needed, and, when needed, shares their evaluations across all places where the argument is required. The call-by-need evaluation is at the heart of a functional programming language such as Haskell. It has in common with the call-by-value evaluation strategy that all places where a same argument is used share the same value. Nevertheless, it observationally behaves like the callby-name evaluation strategy (for the pure λ-calculus), in the sense that a given computation eventually evaluates to a value if and only if it evaluates to the same value (up to inner reduction) along the call-by-name evaluation. In particular, in a setting with non-terminating computations, it is not observationally equivalent to the call-by-value evaluation. Indeed, if the evaluation of a useless argument loops in the call-by-value evaluation, the whole computation loops, which is not the case of call-by-name and call-by-need evaluations.

These three evaluation strategies can be turned into equational theories. For call-by-name and call-by-value, this was done by Plotkin through continuationpassing-style (CPS) semantics characterizing these theories [34]. For the call-byneed evaluation strategy, a specific equational theory reflecting the intensional behavior of the strategy into a semantics was proposed independently by Ariola and Felleisen [1], and by Maraist et al. [26]. A continuation-passing-style semantics was proposed in the 90s by Okasaki et al. [30]. However, this semantics does not ensure normalization of simply-typed call-by-need evaluation, as shown in [2], thus failing to ensure a property which holds in the simply-typed call-by-name and call-by-value cases.

Continuation-passing-style semantics *de facto* gives a semantics to the extension of λ-calculus with control operators<sup>4</sup>. In particular, even though call-byname and call-by-need are observationally equivalent on pure λ-calculus, their different intentional behaviors induce different CPS semantics, leading to different observational behaviors when control operators are considered. On the other hand, the semantics of calculi with control can also be reconstructed from an analysis of the duality between programs and their evaluation contexts, and the duality between the let construct (which binds programs) and a control operator such as Parigot's μ (which binds evaluation contexts). Such an analysis can be done in the context of the λμμ˜-calculus [6,14].

In the call-by-name and call-by-value cases, the approach based on λμμ˜ calculus leads to continuation-passing style semantics similar to the ones given by Plotkin or, in the call-by-name case, also to the one by Lafont et al. [23]. As for call-by-need, in [2] is defined the λlv-calculus, a call-by-need version of the λμμ˜-calculus. A continuation-passing style semantics is then defined via a calculus called λ[lvτ] [2]. This semantics, which is different from Okasaki, Lee and Tarditi's one [30], is the object of study in this paper.

<sup>4</sup> That is to say with operators such as Scheme's callcc, Felleisen's <sup>C</sup>, <sup>K</sup>, or <sup>A</sup> operators [8], Parigot's μ and [ ] operators [31], Crolard's catch and throw operators [5].

#### **2.2 Explicit Environments**

While the results presented in this paper could be directly expressed using the λlv-calculus, the realizability interpretation naturally arises from the decomposition of this calculus into a different calculus with an explicit *environment*, the λ[lvτ]-calculus [2]. Indeed, as we shall see in the sequel, the decomposition highlights different syntactic categories that are deeply involved in the type system and in the definition of the realizability interpretation.

The λ[lvτ]-calculus is a reformulation of the λlv-calculus with explicit environments, called *stores* and denoted by τ . Stores consists of a list of bindings of the form [x := t], where x is a term variable and t a term, and of bindings of the form [α := e] where α is a context variable and e a context. For instance, in the closure cτ [x := t]τ , the variable x is bound to t in c and τ . Besides, the term t might be an unevaluated term (*i.e.* lazily stored), so that if x is eagerly demanded at some point during the execution of this closure, t will be reduced in order to obtain a value. In the case where t indeed produces a value V , the store will be updated with the binding [x := V ]. However, a binding of this form (with a value) is fixed for the rest of the execution. As such, our so-called stores somewhat behave like lazy explicit substitutions or mutable environments.

To draw the comparison between our structures and the usual notions of stores and environments, two things should be observed. First, the usual notion of store refers to a structure of list that is fully mutable, in the sense that the cells can be updated at any time and thus values might be replaced. Second, the usual notion of environment designates a structure in which variables are bounded to closures made of a term and an environment. In particular, terms and environments are duplicated, *i.e.* sharing is not allowed. Such a structure resemble to a tree whose nodes are decorated by terms, as opposed to a machinery allowing sharing (like ours) whose underlying structure is broadly a directed acyclic graph. See for instance [24] for a Krivine abstract machine with sharing.

#### **2.3 Syntax and Reduction Rules**

The lazy evaluation of terms allows for the following reduction rule: us to reduce a command μα.c||μx.c ˜ to the command c together with the binding [x := μα.c].

$$\langle \mu \alpha.c \| \tilde{\mu} x.c' \rangle \to c'[x := \mu \alpha.c],$$

In this case, the term μα.c is left unevaluated ("frozen") in the store, until possibly reaching a command in which the variable x is needed. When evaluation reaches a command of the form x||Fτ [x := μα.c]τ , the binding is opened and the term is evaluated in front of the context ˜μ[x].x||Fτ :

$$\langle x \| F \rangle \tau[x := \mu \alpha.c] \tau' \to \langle \mu \alpha.c \| \tilde{\mu}[x]. \langle x \| F \rangle \tau' \rangle \tau$$

The reader can think of the previous rule as the "defrosting" operation of the frozen term μα.c : this term is evaluated in the prefix of the store τ which predates it, in front of the context ˜μ[x].x||Fτ where the ˜μ[x] binder is waiting for a value.


**Fig. 1.** Syntax and reduction rules of the λ[lvτ]-calculus

This context keeps trace of the part of the store τ that was originally located after the binding [x := ...]. This way, if a value V is indeed furnished for the binder ˜μ[x], the original command x||F is evaluated in the updated full store:

$$\langle V \| \tilde{\mu}[x].\langle x \| F \rangle \tau' \rangle \tau \to \langle V \| F \rangle \tau[x := V] \tau'$$

The brackets in ˜μ[x].c are used to express the fact that the variable x is forced at top-level (unlike contexts of the shape ˜μx.C[x||F] in the λlv-calculus). The reduction system resembles the one of an abstract machine. Especially, it allows us to keep the standard redex at the top of a command and avoids searching through the meta-context for work to be done.

Note that our approach slightly differ from [2] since we split values into two categories: strong values (v) and weak values (V ). The strong values correspond to values strictly speaking. The weak values include the variables which force the evaluation of terms to which they refer into shared strong value. Their evaluation may require capturing a continuation. The syntax of the language, which includes constants **<sup>k</sup>** and co-constants *<sup>κ</sup>*, is given in Fig. 1. As for the reduction <sup>→</sup>, we define it as the compatible reflexive transitive closure of the rules given in Fig. 1.

The different syntactic categories can be understood as the different levels of alternation in a context-free abstract machine (see [2]): the priority is first given to contexts at level e (lazy storage of terms), then to terms at level t (evaluation of μα into values), then back to contexts at level E and so on until level v. These different categories are directly reflected in the definition of the abstract machine defined in [2], and will thus be involved in the definition of our realizability interpretation. We chose to highlight this by distinguishing different types of sequents already in the typing rules that we shall now present.

**Fig. 2.** Typing rules of the λ[lvτ]-calculus

### **2.4 A Type System for the** *λ***[***lvτ* **]-calculus**

We have nine kinds of (one-sided) sequents, one for typing each of the nine syntactic categories. We write them with an annotation on the sign, using one of the letters v, V , t, F, E, e, l, c, τ . Sequents typing values and terms are asserting a type, with the type written on the right; sequents typing contexts are expecting a type A with the type written A⊥; sequents typing commands and closures are black boxes neither asserting nor expecting a type; sequents typing substitutions are instantiating a typing context. In other words, we have the following nine kinds of sequents:

$$\begin{array}{lll} \Gamma \vdash\_l l & \quad \Gamma \vdash\_t t : A & \quad \Gamma \vdash\_e e : A^\perp\\ \Gamma \vdash\_c c & \quad \Gamma \vdash\_V V : A & \quad \Gamma \vdash\_E E : A^\perp\\ \Gamma \vdash\_\tau \tau : \Gamma' & \quad \Gamma \vdash\_v v : A & \quad \Gamma \vdash\_F F : A^\perp \end{array}$$

where types and typing contexts are defined by:

$$A, B ::= X \mid A \to B \qquad\qquad \qquad \Gamma ::= \varepsilon \mid \Gamma, x : A \mid \Gamma, \alpha : A^{\perp} \rangle$$

The typing rules are given on Fig. 2 where we assume that a variable x (resp. co-variable α) only occurs once in a context Γ (we implicitly assume the possibility of renaming variables by α-conversion). We also adopt the convention that constants **<sup>k</sup>** and co-constants *<sup>κ</sup>* come with a signature <sup>S</sup> which assigns them a type. This type system enjoys the property of subject reduction.

**Theorem 1 (Subject reduction).** *If* <sup>Γ</sup> <sup>l</sup> cτ *and* cτ <sup>→</sup> <sup>c</sup> τ *then* Γ <sup>l</sup> c τ *.*

*Proof.* By induction on typing derivations.

### **3 Normalization of the** *λ***[***lvτ* **]-calculus**

#### **3.1 Normalization by Realizability**

The proof of normalization for the λ[lvτ]-calculus that we present in this section is inspired from techniques of Krivine's classical realizability [19], whose notations we borrow. Actually, it is also very close to a proof by reducibility<sup>5</sup>. In a nutshell, to each type A is associated a set |A|<sup>t</sup> of terms whose execution is guided by the structure of A. These terms are the ones usually called *realizers* in Krivine's classical realizability. Their definition is in fact indirect, and is done by orthogonality to a set of "correct" computations, called a *pole*. The choice of this set is central when studying models induced by classical realizability for secondorder-logic, but in the present case we only pay attention to the particular pole of terminating computations. This is where lies one of the difference with usual proofs by reducibility, where everything is done with respect to SN, while our definition are parametric in the pole (which is chosen to be SN in the end). The adequacy lemma, which is the central piece, consists in proving that typed terms belong to the corresponding sets of realizers, and are thus normalizing.

More in details, our proof can be sketched as follows. First, we generalize the usual notion of closed term to the notion of closed *term-in-store*. Intuitively, this is due to the fact that we are no longer interested in closed terms and substitutions to close opened terms, but rather in terms that are closed when considered in the current store. This is based on the simple observation that a store is nothing more than a shared substitution whose content might evolve along the execution. Second, we define the notion of *pole* ⊥⊥, which are sets of closures closed by anti-evaluation and store extension. In particular, the set of normalizing closures is a valid pole. This allows to relate terms and contexts thanks to a notion of orthogonality with respect to the pole. We then define for each formula A and typing level o (of e, t, E, V, F, v) a set |A|<sup>o</sup> (resp. A <sup>o</sup>) of terms (resp. contexts) in the corresponding syntactic category. These sets correspond to reducibility candidates, or to what is usually called truth values and falsity values in Krivine realizability. Finally, the core of the proof consists in the adequacy lemma, which shows that any closed term of type A at level o is in the corresponding set |A|o. This guarantees that any typed closure is in any pole, and in particular in the pole of normalizing closures. Technically, the proof of adequacy evaluates in each case a state of an abstract machine (in our case a closure), so that the proof also proceeds by evaluation. A more detailed explanation of this observation as well as a more introductory presentation of normalization proofs by classical realizability are given in an article by Dagand and Scherer [7].

### **3.2 Realizability Interpretation for the** *λ***[***lvτ* **]-calculus**

We begin by defining some key notions for stores that we shall need further in the proof.

<sup>5</sup> See for instance the proof of normalization for system D presented in [17, Sect. 3.2].

**Definition 2 (Closed store).** *We extend the notion of free variable to stores:*

$$\begin{array}{rcl} FV(\varepsilon) & \stackrel{\scriptstyle \Delta}{=} & \emptyset \\ FV(\tau[x:=t]) & \stackrel{\scriptstyle \Delta}{=} & FV(\tau) \cup \{ y \in FV(t) : y \notin \mathsf{dom}(\tau) \} \\ FV(\tau[\alpha:=E]) & \stackrel{\scriptstyle \Delta}{=} & FV(\tau) \cup \{ \beta \in FV(E) : \beta \notin \mathsf{dom}(\tau) \} \end{array}$$

*so that we can define a* closed store *to be a store* τ *such that* F V (τ ) = ∅*.*

**Definition 3 (Compatible stores).** *We say that two stores* τ *and* τ *are* independent *and write* τ # τ *when dom*(τ ) ∩ *dom*(τ ) = ∅*. We say that they are* compatible *and write* τ τ *whenever for all variables* x *(resp. co-variables* α*) present in both stores:* x ∈ *dom*(τ ) ∩ *dom*(τ )*; the corresponding terms (resp. contexts) in* τ *and* τ *coincide. Finally, we say that* τ *is an* extension *of* τ *and write* τ τ *whenever dom*(τ ) ⊆ *dom*(τ ) *and* τ τ *.*

We denote by τ τ the compatible union join(τ τ ) of closed stores τ and τ *,* defined by:

$$\begin{array}{rclclcl}\mathbf{j}\bullet\mathbf{in}(\tau\_{0}[x:=t]\tau\_{1},\tau\_{0}'[x:=t]\tau\_{1}') & \triangleq & \tau\_{0}\tau\_{0}'[x:=t]\mathbf{j}\bullet\mathbf{in}(\tau\_{1},\tau\_{1}') & & (\text{if }\tau\_{0}\#\tau\_{0}')\\\mathbf{j}\bullet\mathbf{in}(\tau,\tau') & \triangleq & \tau\tau' & & (\text{if }\tau\#\tau')\\\mathbf{j}\bullet\mathbf{in}(\varepsilon,\tau) & \triangleq & \tau\\\mathbf{j}\bullet\mathbf{in}(\tau,\varepsilon) & \triangleq & \tau\end{array}$$

The following lemma (which follows easily from the previous definition) states the main property we will use about union of compatible stores.

**Lemma 4.** *If* τ *and* τ *are two compatible stores, then* τ τ τ *and* τ τ τ *. Besides, if* τ *is of the form* τ0[x := t]τ1*, then* τ τ *is of the form* τ2[x := t]τ<sup>3</sup> *with* τ<sup>0</sup> τ<sup>2</sup> *and* τ<sup>1</sup> τ3*.*

*Proof.* This follows easily from the previous definition.

As we explained in the introduction of this section, we will not consider closed terms in the usual sense. Indeed, while it is frequent in the proofs of normalization (*e.g.* by realizability or reducibility) of a calculus to consider only closed terms and to perform substitutions to maintain the closure of terms, this only makes sense if it corresponds to the computational behavior of the calculus. For instance, to prove the normalization of λx.t in typed call-by-name λμμ˜-calculus, one would consider a substitution ρ that is suitable for with respect to the typing context Γ, then a context u · e of type A → B, and evaluates:

$$
\langle \lambda x.t\_\rho \| u \cdot e \rangle \quad \to \quad \langle t\_\rho [u/x] \| e \rangle
$$

Then we would observe that tρ[u/x] = tρ[x:=u] and deduce that ρ[x := u] is suitable for Γ, x : A, which would allow us to conclude by induction.

However, in the λ[lvτ]-calculus we do not perform global substitution when reducing a command, but rather add a new binding [x := u] in the store:

$$\langle \lambda x.t \| u \cdot E \rangle \tau \quad \rightarrow \quad \langle t \| E \rangle \tau [x := u] \vert$$

Therefore, the natural notion of closed term invokes the closure under a store, which might evolve during the rest of the execution (this is to contrast with a substitution).

**Definition 5 (Term-in-store).** *We call* closed term-in-store *(resp.* closed context-in-store*,* closed closures*) the combination of a term* t *(resp. context* e*, command* c*) with a closed store* τ *such that* F V (t) ⊆ *dom*(τ )*. We use the notation* (t|τ ) *(resp.* (e|τ ),(c|τ )*) to denote such a pair.*

We should note that in particular, if t is a closed term, then (t|τ ) is a term-instore for any closed store τ . The notion of closed term-in-store is thus a generalization of the notion of closed terms, and we will (ab)use of this terminology in the sequel. We denote the sets of closed closures by C0, and will identify (c|τ ) and the closure cτ when c is closed in τ . Observe that if cτ is a closure in C<sup>0</sup> and τ is a store extending τ , then cτ is also in C0. We are now equipped to define the notion of pole, and verify that the set of normalizing closures is indeed a valid pole.

**Definition 6 (Pole).** *A subset* ⊥⊥⊆C<sup>0</sup> *is said to be* saturated *or* closed by anti-reduction *whenever for all* (c|τ ),(c |τ ) ∈ C0*, if* c τ ∈ ⊥⊥ *and* cτ → c τ *then* cτ ∈ ⊥⊥*. It is said to be* closed by store extension *if whenever* cτ ∈ ⊥⊥*, for any store* τ *extending* τ *:* τ τ *,* cτ ∈ ⊥⊥*. A* pole *is defined as any subset of* C<sup>0</sup> *that is closed by anti-reduction and store extension.*

The following proposition is the one supporting the claim that our realizability proof is almost a reducibility proof whose definitions have been generalized with respect to a pole instead of the fixed set SN.

**Proposition 7.** *The set* ⊥⊥⇓ <sup>=</sup> {cτ ∈ C<sup>0</sup> : cτ *normalizes* } *is a pole.*

*Proof.* As we only considered closures in C0, both conditions (closure by antireduction and store extension) are clearly satisfied:


**Definition 8 (Orthogonality).** *Given a pole* ⊥⊥*, we say that a term-in-store* (t|τ ) *is* orthogonal *to a context-in-store* (e|τ ) *and write* (t|τ )⊥⊥(e|τ ) *if* τ *and* τ *are compatible and* t||eτ τ ∈ ⊥⊥*.*

*Remark 9.* The reader familiar with Krivine's forcing machine [20] might recognize his definition of orthogonality between terms of the shape (t, p) and stacks of the shape (π, q), where p and q are forcing conditions<sup>6</sup>:

$$(t, p) \bot (\pi, q) \Leftrightarrow (t \star \pi, p \land q) \in \bot$$

<sup>6</sup> The meet of forcing conditions is indeed a refinement containing somewhat the "union" of information contained in each, just like the union of two compatible stores.

We can now relate closed terms and contexts by orthogonality with respect to a given pole. This allows us to define for any formula A the sets |A|v, |A|<sup>V</sup> , |A|<sup>t</sup> (resp. A <sup>F</sup> , A <sup>E</sup>, A <sup>e</sup>) of realizers (or reducibility candidates) at level v, V , t (resp. F, E, e) for the formula A. It is to be observed that realizers are here closed terms-in-store.

### **Definition 10 (Realizers).** *Given a fixed pole* ⊥⊥*, we set:*

$$\begin{array}{ccl} |X|\_v &= \{ (\mathbf{k}|\tau) : \ \vdash \mathbf{k} : X \} \\ |A \to B|\_v &= \{ (\lambda x. t | \tau) : \forall u \tau', \tau \diamond \tau' \land (u | \tau') \in |A|\_v \Rightarrow (t| \overline{\tau \tau'} [x := u]) \in |B|\_t \} \\ |A|\_F &= \{ (F|\tau) : \forall v \tau', \tau \diamond \tau' \land (v | \tau') \in |A|\_v \Rightarrow (v| \tau') \amalg (F|\tau) \} \\ |A|\_V &= \{ (V|\tau) : \forall F \tau', \tau \diamond \tau' \land (F|\tau') \in |A|\_F \Rightarrow (V|\tau) \amalg (F|\tau') \} \\ ||A||\_E &= \{ (E|\tau) : \forall V \tau', \tau \diamond \tau' \land (V|\tau') \in |A|\_V \Rightarrow (V|\tau') \amalg (E|\tau) \} \\ |A|\_t &= \{ (t|\tau) : \forall E \tau', \tau \diamond \tau' \land (E|\tau') \in |A|\_E \Rightarrow (t|\tau) \amalg (E|\tau') \} \\ ||A||\_e &= \{ (e|\tau) : \forall t \tau', \tau \diamond \tau' \land (t|\tau') \in |A|\_t \Rightarrow (t|\tau') \amalg (e|\tau) \} \end{array}$$

*Remark 11.* We draw the reader attention to the fact that we should actually write |A| ⊥⊥ <sup>v</sup> , A ⊥⊥ <sup>F</sup> , etc. and τ ⊥⊥ Γ, because the corresponding definitions are parameterized by a pole ⊥⊥. As it is common in Krivine's classical realizability, we ease the notations by removing the annotation ⊥⊥ whenever there is no ambiguity on the pole. Besides, it is worth noting that if co-constants do not occur directly in the definitions, they may still appear in the realizers by mean of the pole.

If the definition of the different sets might seem complex at first sight, we claim that they are quite natural in regards of the methodology of Danvy's semantics artifacts presented in [2]. Indeed, having an abstract machine in context-free form (the last step in this methodology before deriving the CPS) allows us to have both the term and the context (in a command) that behave independently of each other. Intuitively, a realizer at a given level is precisely a term which is going to behave well (be in the pole) in front of any opponent chosen in the previous level (in the hierarchy v, F, V , etc.). For instance, in a call-by-value setting, there are only three levels of definition (values, contexts and terms) in the interpretation, because the abstract machine in context-free form also has three. Here the ground level corresponds to strong values, and the other levels are somewhat defined as terms (or context) which are well-behaved in front of any opponent in the previous one. The definition of the different sets |A|v, A <sup>F</sup> , |A|<sup>V</sup> , etc. directly stems from this intuition.

In comparison with the usual definition of Krivine's classical realizability, we only considered orthogonal sets restricted to some syntactical subcategories. However, the definition still satisfies the usual monotonicity properties of biorthogonal sets:

**Proposition 12.** *For any type* <sup>A</sup> *and any given pole* ⊥⊥*, we have:*

$$1. \vert A \vert\_{v} \subseteq \vert A \vert\_{V} \subseteq \vert A \vert\_{t}; \tag{1.4} \\ \qquad \qquad \qquad \mathcal{2}. \vert A \vert\_{F} \subseteq \vert A \vert\_{E} \subseteq \vert A \vert\_{e}.$$

*Proof.* All the inclusions are proved in a similar way. We only give the proof for |A|<sup>v</sup> ⊆ |A|<sup>V</sup> . Let ⊥⊥ be a pole and (v|τ ) be in |A|v. We want to show that (v|τ ) is in |A|<sup>V</sup> , that is to say that v is in the syntactic category V (which is true), and that for any (F|τ ) ∈ A <sup>F</sup> such that τ τ , (v|τ )⊥⊥(F|τ ). The latter holds by definition of (F|τ ) ∈ A <sup>F</sup> , since (v|τ ) ∈ |A|v.

We now extend the notion of realizers to stores, by stating that a store τ realizes a context Γ if it binds all the variables x and α in Γ to a realizer of the corresponding formula.

**Definition 13.** *Given a closed store* <sup>τ</sup> *and a fixed pole* ⊥⊥*, we say that* <sup>τ</sup> realizes Γ*, which we write*<sup>7</sup> τ Γ*, if:*

*1. for any* (x : A) ∈ Γ*,* τ ≡ τ0[x := t]τ<sup>1</sup> *and* (t|τ0) ∈ |A|<sup>t</sup> *2. for any* (α : A⊥) ∈ Γ*,* τ ≡ τ0[α := E]τ<sup>1</sup> *and* (E|τ0) ∈ A <sup>E</sup>

In the same way than weakening rules (for the typing context) are admissible for each level of the typing system:

$$\frac{\Gamma \vdash\_t t : A \quad \Gamma \subseteq \Gamma'}{\Gamma' \vdash\_t t : A} \qquad \frac{\Gamma \vdash\_e e : A^\perp \quad \Gamma \subseteq \Gamma'}{\Gamma' \vdash\_e e : A^\perp} \quad \dots \quad \frac{\Gamma \vdash\_\tau \tau : \Gamma'' \quad \Gamma \subseteq \Gamma'}{\Gamma' \vdash\_\tau \tau : \Gamma''}$$

the definition of realizers is compatible with a weakening of the store.

**Lemma 14 (Store weakening).** *Let* τ *and* τ *be two stores such that* τ τ *, let* Γ *be a typing context and let* ⊥⊥ *be a pole. The following statements hold:*


*Proof.* 1. Straightforward from the definition of τ τ¯ .


### **Definition 15 (Adequacy).** *Given a fixed pole* ⊥⊥*, we say that:*

*– A typing judgment* Γ <sup>t</sup> t : A *is* adequate *(w.r.t. the pole* ⊥⊥*) if for all stores* τ Γ*, we have* (t|τ ) ∈ |A|t*.*

<sup>7</sup> Once again, we should formally write <sup>τ</sup> ⊥⊥ <sup>Γ</sup> but we will omit the annotation by ⊥⊥ as often as possible.

*– More generally, we say that an inference rule*

$$\begin{array}{c} J\_1 \\ \hline \hline \hline J\_0 \\ \hline \end{array} \cdots \quad \begin{array}{c} J\_n \\ \hline \hline \end{array}$$

*is adequate (w.r.t. the pole* ⊥⊥*) if the adequacy of all typing judgments* J1,...,J<sup>n</sup> *implies the adequacy of the typing judgment* J0*.*

*Remark 16.* From the latter definition, it is clear that a typing judgment that is derivable from a set of adequate inference rules is adequate too.

We will now show the main result of this section, namely that the typing rules of Fig. 2 for the λ[lvτ]-calculus without co-constants are adequate with any pole. Observe that this result requires to consider the λ[lvτ]-calculus without co-constants. Indeed, we consider co-constants as coming with their typing rules, potentially giving them any type (whereas constants can only be given an atomic type). Thus, there is *a priori* no reason<sup>8</sup> why their types should be adequate with any pole.

However, as observed in the previous remark, given a fixed pole it suffices to check whether the typing rules for a given co-constant are adequate with this pole. If they are, any judgment that is derivable using these rules will be adequate.

**Theorem 17 (Adequacy).** *If* <sup>Γ</sup> *is a typing context,* ⊥⊥ *is a pole and* <sup>τ</sup> *is a store such that* τ Γ*, then the following holds in the* λ[lvτ]*-calculus without co-constants:*


*Proof.* The different statements are proved by mutual induction over typing derivations. We only give the most important cases here.

**Rule** (→l). Assume that

$$\frac{\begin{array}{c} \Gamma \vdash\_{t} u:A \quad \Gamma \vdash\_{E} E:B \end{array} \begin{array}{c} E:B^{\bot} \end{array}}{\begin{array}{c} \Gamma \vdash\_{F} u:E:(A \to B)^{\bot} \end{array}} (\to\_{l}) $$

and let ⊥⊥ be a pole and τ a store such that τ Γ. Let (λx.t|τ ) be a closed term in the set |A → B|<sup>v</sup> such that τ τ , then we have:

$$\begin{array}{ccccc}\langle \lambda x.t \| u \cdot E \rangle \overline{\tau \tau'} & \rightarrow & \langle u \| \tilde{\mu} x. \langle t \| E \rangle \rangle \overline{\tau \tau'} & \rightarrow & \langle t \| E \rangle \overline{\tau \tau'} [x := u] \end{array}$$

<sup>8</sup> Think for instance of a co-constant of type (<sup>A</sup> <sup>→</sup> <sup>B</sup>) <sup>⊥</sup>, there is no reason why it should be orthogonal to any function in |A → B|v.

By definition of |A → B|v, this closure is in the pole, and we can conclude by anti-reduction.

**Rule** (x). Assume that

$$\frac{(x:A)\in\Gamma}{\Gamma\vdash\_V x:A}\_{(x)}(x)$$

and let ⊥⊥ be a pole and τ a store such that τ Γ. As (x : A) ∈ Γ, we know that τ is of the form τ0[x := t]τ<sup>1</sup> with (t|τ0) ∈ |A|t. Let (F|τ ) be in A <sup>F</sup> , with τ τ . By Lemma 4, we know that τ τ is of the form τ0[x := t]τ1. Hence we have:

$$\langle x \| F \rangle \overline{\tau\_0} [x := t] \overline{\tau\_1} \quad \rightarrow \quad \langle t \| \tilde{\mu} [x]. \langle x \| F \rangle \overline{\tau\_1} \rangle \overline{\tau\_0}$$

and it suffices by anti-reduction to show that the last closure is in the pole ⊥⊥. By induction hypothesis, we know that (t|τ0) ∈ |A|<sup>t</sup> thus we only need to show that it is in front of a catchable context in A <sup>E</sup>. This corresponds exactly to the next case that we shall prove now.

**Rule** (˜μ[]). Assume that

$$\frac{\Gamma, x:A, \Gamma' \vdash\_F F:A \quad \Gamma, x:A \vdash \tau':\Gamma'}{\Gamma \vdash\_E \tilde{\mu}[x]. \langle x \| F \rangle \tau':A} \text{ ( $\tilde{\mu}^{\lceil \rceil}$ )}$$

and let ⊥⊥ be a pole and τ a store such that τ Γ. Let (V |τ0) be a closed term in |A|<sup>V</sup> such that τ<sup>0</sup> τ . We have that:

$$\langle V \| \tilde{\mu}[x].\langle x \| F \rangle \overline{\tau'} \rangle \overline{\tau\_0 \tau} \quad \rightarrow \quad \langle V \| F \rangle \overline{\tau\_0 \tau} [x := V] \tau'$$

By induction hypothesis, we obtain τ [x := V ]τ Γ, x : A, Γ . Up to αconversion in F and τ , so that the variables in τ are disjoint from those in τ0, we have that τ0τ Γ (by Lemma 14) and then τ τ0τ [x := V ]τ Γ, x : A, Γ . By induction hypothesis again, we obtain that (F|τ ) ∈ A <sup>F</sup> (this was an assumption in the previous case) and as (V |τ0) ∈ |A|<sup>V</sup> , we finally get that (V |τ0)⊥⊥(F|τ ) and conclude again by anti-reduction.

**Corollary 18.** *If* cτ *is a closure such that* <sup>l</sup> cτ *is derivable, then for any pole* ⊥⊥ *such that the typing rules for co-constants used in the derivation are adequate with* ⊥⊥*,* cτ ∈ ⊥⊥*.*

We can now put our focus back on the normalization of typed closures. As we already saw in Proposition 7, the set ⊥⊥⇓ of normalizing closure is a valid pole, so that it only remains to prove that any typing rule for co-constants is adequate with ⊥⊥⇓.

**Lemma 19.** *Any typing rule for co-constants is adequate with the pole* ⊥⊥⇓*,* i.e. *if* Γ *is a typing context, and* τ *is a store such that* τ Γ*, if κ is a co-constant such that* Γ <sup>F</sup> *κ* : A⊥*, then* (*κ*|τ ) ∈ A <sup>F</sup> *.*

*Proof.* This lemma directly stems from the observation that for any store τ and any closed strong value (v|τ ) ∈ |A|v, v||*κ*τ τ does not reduce and thus belongs to the pole ⊥⊥⇓.

As a consequence, we obtain the normalization of typed closures of the full calculus.

**Theorem 20.** *If* cτ *is a closure of the* <sup>λ</sup>[lvτ]*-calculus such that* <sup>l</sup> cτ *is derivable, then* cτ *normalizes.*

This is to be contrasted with Okasaki, Lee and Tarditi's semantics for the call-by-need λ-calculus, which is not normalizing in the simply-typed case, as shown in Ariola *et al.* [2].

#### **3.3 Extension to 2nd-Order Type Systems**

We focused in this article on simply-typed versions of the λlv and λ[lvτ] calculi. But as it is common in Krivine classical realizability, first and second-order quantifications (in Curry style) come for free through the interpretation. This means that we can for instance extend the language of types to first and secondorder predicate logic:

$$\begin{array}{l} e\_1, e\_2 ::= x \mid f(e\_1, \ldots, e\_k) \\ A, B ::= X(e\_1, \ldots, e\_k) \mid A \to B \mid \forall x. A \mid \forall X. A \end{array}$$

We can then define the following introduction rules for universal quantifications:

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash\_{v} v:A \quad x \notin FV(\Gamma) \\ \hline \Gamma \vdash\_{v} v:\forall x.A \end{array} (\forall\_{r}^{4}) \end{array} (\begin{array}{c} \begin{array}{c} \Gamma \vdash\_{v} v:A \quad X \notin FV(\Gamma) \\ \hline \Gamma \vdash\_{v} v:\forall X.A \end{array} (\forall\_{r}^{2})) \end{array}$$

Observe that these rules need to be restricted at the level of strong values, just as they are restricted to values in the case of call-by-value<sup>9</sup>. As for the left rules, they can be defined at any levels, let say the more general e:

$$\frac{\Gamma \vdash\_e e : (A[n/x])^\perp}{\Gamma \vdash\_e e : (\forall x. A)^\perp} \quad (\forall\_l^1) \qquad \qquad \frac{\Gamma \vdash\_e e : (A[B/X])^\perp}{\Gamma \vdash\_e e : (\forall X. A)^\perp} \quad (\forall\_l^2) \Psi$$

where n is any natural number and B any formula. The usual (call-by-value) interpretation of the quantification is defined as an intersection over all the possible instantiations of the variables within the model. We do not wish to enter into too many details<sup>10</sup> on this topic here, but first-order variable are to be instantiated by integers, while second order are to be instantiated by subset of terms at the lower level, *i.e.* closed strong-values in store (which we write V0):

$$|\forall x. A|\_v = \bigcap\_{n \in \mathbb{N}} |A[n/x]|\_v \qquad \qquad |\forall X. A|\_v = \bigcap\_{S \in \mathbb{N}^k \to \mathcal{P}(\mathcal{V}\_0)} |A[S/X]|\_v$$

<sup>9</sup> For further explanation on the need for a value restriction in Krivine realizability, we refer the reader to [29] or [25].

<sup>10</sup> Once again, we advise the interested reader to refer to [29] or [25] for further details.

where the variable X is of arity k. It is then routine to check that the typing rules are adequate with the realizability interpretation.

### **4 Conclusion and Further Work**

In this paper, we presented a system of simple types for a call-by-need calculus with control, which we proved to be safe in that it satisfies subject reduction (Theorem 1) and that typed terms are normalizing (Theorem 20). We proved the normalization by means of realizability-inspired interpretation of the λ[lvτ] calculus. Incidentally, this opens the doors to the computational analysis (in the spirit of Krivine realizability) of classical proofs using control, laziness and shared memory.

In further work, we intend to present two extensions of the present paper. First, following the definition of the realizability interpretation, we managed to type the continuation-and-store passing style translation for the λ[lvτ]-calculus (see [2]). Interestingly, typing the translation emphasizes its computational content, and in particular, the store-passing part is reflected in a Kripke forcing-like manner of typing the extensibility of the store [28, Chap. 6].

Second, on a different aspect, the realizability interpretation we introduced could be a first step towards new ways of realizing axioms. In particular, the first author used in his Ph.D. thesis [28, Chap. 8] the techniques presented in this paper to give a normalization proof for dPA<sup>ω</sup>, a proof system developed by the second author [15]. Indeed, this proof system allows to define a proof for the axiom of dependent choice thanks to the use of streams that are lazily evaluated, and was lacking a proper normalization proof.

Finally, to determine the range of our technique, it would be natural to investigate the relation between our framework and the many different presentations of call-by-need calculi (with or without control). Amongst other calculi, we could cite Chang-Felleisen presentation of call-by-need [4], Garcia *et al.* lazy calculus with delimited control [10] or Kesner's recent paper on normalizing by-need terms characterized by an intersection type system [16]. To this end, we might rely on P´edrot and Saurin's classical by-need [33]. They indeed relate (classical) call-by-need with linear head-reduction from a computational point of view, and draw the connections with the presentations of Ariola *et al.* [2] and Chang-Felleisen [4]. Ariola *et al.* λlv-calculus being close to the λ[lvτ]-calculus (see [2] for further details), our technique is likely to be adaptable to their framework, and thus to P´edrot and Saurin's system.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Quotient Inductive-Inductive Types**

Thorsten Altenkirch<sup>1</sup> , Paolo Capriotti<sup>1</sup> , Gabe Dijkstra<sup>3</sup> , Nicolai Kraus1(B) , and Fredrik Nordvall Forsberg<sup>2</sup>

<sup>1</sup> University of Nottingham, Nottingham, UK {thorsten.altenkirch,paolo.capriotti,nicolai.kraus}@nottingham.ac.uk <sup>2</sup> University of Strathclyde, Glasgow, Scotland fredrik.nordvall-forsberg@strath.ac.uk <sup>3</sup> London, UK gabe.dijkstra@gmail.com

**Abstract.** Higher inductive types (HITs) in Homotopy Type Theory allow the definition of datatypes which have constructors for equalities over the defined type. HITs generalise quotient types, and allow to define types with non-trivial higher equality types, such as spheres, suspensions and the torus. However, there are also interesting uses of HITs to define types satisfying uniqueness of equality proofs, such as the Cauchy reals, the partiality monad, and the well-typed syntax of type theory. In each of these examples we define several types that depend on each other mutually, i.e. they are inductive-inductive definitions. We call those HITs quotient inductive-inductive types (QIITs). Although there has been recent progress on a general theory of HITs, there is not yet a theoretical foundation for the combination of equality constructors and induction-induction, despite many interesting applications. In the present paper we present a first step towards a semantic definition of QIITs. In particular, we give an initial-algebra semantics. We further derive a *section induction principle*, stating that every algebra morphism into the algebra in question has a section, which is close to the intuitively expected elimination rules.

### **1 Introduction**

This paper is about type theory in the sense of Martin-L¨of [29], a theory which proof assistants such as Coq [7] and Lean [14] as well as programming languages such as Agda [31] and Idris [8] are based on. Recently, Homotopy type theory (HoTT) [34] has been introduced inspired by homotopy theoretic interpretations of type theory by Awodey and Warren [5] and Voevodsky [25,36].

A central concept in type theory is the concept of inductive definitions, which allows us to define inductive datatypes like the natural numbers, lists and trees just by presenting constructors with strictly positive occurrences of the inductive type being defined. Using the propositions as types explanation, we can use the same mechanism to inductively define predicates and relations, like an order on the natural numbers, or the derivability predicate for a logic defined by rules. Conceptually, HoTT changes what we mean by an inductive definition, because we view a type not only as given by its elements (points) but also by its equality proofs c The Author(s) 2018

<sup>-</sup>C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 293–310, 2018. https://doi.org/10.1007/978-3-319-89366-2\_16

(paths). Hence an inductive definition may not only feature constructors for elements but also for equalities. This concept of higher inductive types (HITs) has been used to represent the homotopical structure of geometric objects, like circles, spheres and tori, and gives rise to synthetic homotopy theory in HoTT [32].

However, as already noted in the HoTT Book [34], HITs have also more quotidian applications, such as a definition of the Cauchy reals for which the use of the axiom of choice can be avoided when proving e.g. Cauchy completeness. Instead of defining the real numbers as a quotient of sequences of rationals, a HIT is used to define them as the Cauchy completion of the rational numbers, with the quotienting happening simultaneously with the completion definition. Similarly, a definition of the partiality monad, which represents potentially diverging operations over a given type, was given using a HIT [2,13,35], again avoiding the axiom of choice when showing e.g. that the construction is a monad [12].

As we see from these examples, the idea of generating points and equalities of a type inductively is interesting, even if we do not care about the higher equality structure of types, or if we do not want it. For example: consider trees branching over an arbitrary type A, quotiented by arbitrary permutations of subtrees. We first define the type T<sup>0</sup>(A) of <sup>A</sup>-branching trees, given by the constructors

$$\begin{aligned} \mathsf{leaf}\_0 &: T\_0(A) \\ \mathsf{node}\_0 &: (A \to T\_0(A)) \to T\_0(A) . \end{aligned}$$

We then form the binary relation <sup>R</sup> on <sup>T</sup><sup>0</sup>(A) that we want to quotient by as follows: R is the smallest relation such that for any auto-equivalence on A (i.e. any e : A <sup>→</sup> A which has an inverse) and f : A <sup>→</sup> T<sup>0</sup>(A), we have a proof <sup>p</sup>*f,e* : <sup>R</sup>(node0(f), node0(<sup>f</sup> ◦ <sup>e</sup>)), and, secondly, for g, h : <sup>A</sup> <sup>→</sup> <sup>T</sup><sup>0</sup>(A) such that (<sup>n</sup> : <sup>A</sup>) <sup>→</sup> <sup>R</sup>(g(n), h(n)), we have a proof <sup>c</sup>*f,g* : <sup>R</sup>(node0(g), node0(h)). We can then form the quotient type T<sup>0</sup>(A)/R, which is the type of unlabelled trees where each node has an A-indexed family of subtrees, and two trees which agree modulo the "order" of its subtrees are equal. For A <sup>≡</sup> **<sup>2</sup>**, these are binary trees where the order of the two subtrees of each node does not matter.

Now, morally, from a family A <sup>→</sup> (T<sup>0</sup>(A)/R), we should be able to construct an element of the quotient T<sup>0</sup>(A)/R. This is indeed possible if <sup>A</sup> is **<sup>2</sup>** or another finite type, by applying the induction principle of the quotient type A times. However, it seems that, for a general type A, this would require the axiom of choice [34], which unfortunately is not a constructive principle [15]. But using a higher inductive type, we can give an alternative definition for the type of A-branching trees modulo permutation of subtrees.

*Example 1.* Given a type A, we define T(A) : hSet by

$$\begin{aligned} \mathsf{models} &: T(A) \\ \mathsf{models} &: (A \to T(A)) \to T(A) \\ \mathsf{mix} &: \quad (f: A \to T) \to (e: A \cong A) \to \mathsf{node}(f) = \mathsf{node}(f \circ e). \end{aligned}$$

Note that the fact that T(A) is a homotopy set (see *preliminaries* below) is implicitly included in the statement T(A) : hSet. The construction we were looking for is now directly given by the constructor node. This demonstration of the usefulness of higher inductive constructions to increase the strength of quotients was first discussed in Altenkirch and Kaposi [1], where such set-truncated HITs are called *quotient inductive types* (QITs).

Another example of the use of higher inductive types is *type theory in type theory* [1], where the well-typed syntax of type theory is implemented as a higher *inductive-inductive* [30] type in type theory itself. A significantly simplified version of this will serve as a running example for us:

*Example 2.* We define the syntax of a (very basic) type theory by constructing types representing contexts and types as follows. A set Con : hSet and a type family Ty : Con <sup>→</sup> hSet are simultaneously defined by giving the constructors

$$\begin{array}{lcl}\varepsilon: & \mathsf{Con} \\ \mathsf{ext}: & (\varGamma : \mathsf{Con}) \to \mathsf{Ty}(\varGamma) \to \mathsf{Con} \\ \iota: & (\varGamma : \mathsf{Con}) \to \mathsf{Ty}(\varGamma) \\ \sigma: & (\varGamma : \mathsf{Con}) \to (A : \mathsf{Ty}(\varGamma)) \to \mathsf{Ty}(\mathsf{ext}\varGamma \, A) \to \mathsf{Ty}(\varGamma) \\ \sigma\_{\mathsf{eq}}: & (\varGamma : \mathsf{Con}) \to (A : \mathsf{Ty}(\varGamma)) \to (B : \mathsf{Ty}(\mathsf{ext}\,\Gamma \, A)) \\ & \qquad \qquad \qquad \qquad \qquad \qquad \mathsf{Ext}\, (\mathsf{ext}\,\Gamma \, A) \, B =\_{\mathsf{Con}} \, \mathsf{ext}\,\Gamma \, (\sigma \,\Gamma \, A \, B). \end{array}$$

For simplicity, we do not consider terms. Contexts are either empty ε, or an extended context ext Γ A representing the context Γ extended by a fresh variable of type A. Types are either the base type ι (well-typed in any context), or Σ-types represented by σΓ AB (well-typed in context Γ if A is well-typed in context Γ, and B is well-typed in the extended context ext Γ A). Type theory in type theory as in [1] has plenty of equality constructors, which play a role as soon as terms are introduced. To keep the example simple we instead use another equality, stating that extending a context by A followed by B is equal to extending it by σΓ AB. This equality is given by σeq. Note that it is not possible to list the constructors of Con and Ty separately: due to the mutual dependency, the Ty-constructor σ has to be given in between of the two Conconstructors ext and σeq.

Despite a lot of work making use of concrete HITs [4,9–11,23,26,27], and despite the fact that it is usually — on some intuitive level — clear for the expert how the elimination principle for such a HIT can be derived, giving a general specification and a theoretical foundation for HITs has turned out to be a major difficulty. Several approaches have been proposed [6,18,28,33], and they do indeed give a satisfactory specification of HITs in the sense that they cover all HITs which have been used so far (see *related work* below). However, to the best of our knowledge, no approach covers *higher inductive-inductive* definitions such as Example 2. The purpose of the current paper is to remedy this. We restrict ourselves to sets, i.e. to *quotient inductive-inductive types* (QIITs). This is of course a serious restriction, since it means that we cannot capture many ordinary HITs such as e.g. the circle S<sup>1</sup>. At the same time, all higher inductive-inductive types that we know of are indeed sets — the Cauchy reals, the surreal numbers, the partiality monad, type theory in type theory, permutable trees — and will be instances of our framework, which allows arbitrarily complicated dependency structures. In particular, we allow intermixing of constructors as in Example 2.

**Contributions.** We give a formal specification of quotient inductive-inductive types with arbitrary dependency structure. This can be viewed as the generalisation of the usual semantics of inductive types as initial algebras of a functor to quotient inductive-inductive types. A QIIT is specified by (i) its *sorts*, which encode the types and type families that it consists of (Sect. 2), and (ii) by a sequence of *constructors*, that in turn are specified by *argument* and *target functors* (Sect. 3). This is a very general framework, covering in particular point (Sect. 3.2) and path constructors (Sect. 3.4). Each constructor specification gives rise to a category of algebras, and we establish conditions on the target functors that allow us to conclude that these categories of algebras are complete (Sect. 3.5). This is important, because it allows us to prove the equivalence of initiality and a principle that we call *section induction* (Sect. 4), stating that every algebra morphism into the algebra in question has a section; this principle is close to the intuitively expected elimination rules.

A full version of the paper, including all proofs, is available on the arXiv [3].

**Related Work.** Sojakova [33] shows the correspondence between initiality and induction (a variant of our Theorem 31) for W-suspensions, a restricted class of HITs. Basold, Geuvers and van der Weide [6] introduce a syntactic schema for HITs without higher path constructors, and derive their elimination rules. Dybjer and Moeneclaey [18] give a syntactic schema for finitary HITs with at most paths between paths, and give an interpretation in Hofmann and Streicher's groupoid model [22]. Finally, Lumsdaine and Shulman's work on the semantics of HITs in model categories [28] is similar to an external version of our approach.

**Preliminaries.** We work in a standard Martin-L¨of style type theory and assume function extensionality. We do not assume univalence, but also do not contradict it; in particular, everything we do works in the type theory from the HoTT Book [34]. We write U for "the" universe of types, omitting universe indices in the *typical ambiguity* style [21]. A type is a set if all its equality proofs are equal, and hSet is defined as Σ(A : <sup>U</sup>).is-set(A); we implicitly treat elements of hSet as their first projections — this allows us to view hSet as a universe. By a *category*, we mean a precategory [34, Definition 9.1.1] in the sense of the HoTT Book (all our categories become univalent categories if univalence is assumed). We write C⇒D for functors and X <sup>→</sup> Y for functions between types. We denote the obvious category of sets and functions by hSet as well; consequently, F : A <sup>→</sup> hSet denotes a type family, while F : C ⇒ hSet denotes a functor. For such a functor F : C ⇒ hSet, we write - C F for the *category of elements* of F, whose objects are pairs (X, x) of an object X in <sup>C</sup> and an element x : F X. For a function f : X <sup>→</sup> Y and z, w : X, we write ap f : z <sup>=</sup> w <sup>→</sup> f(z) = f(w) for the usual "action of a function to paths", <sup>−</sup><sup>1</sup> : <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>→</sup> <sup>y</sup> <sup>=</sup> <sup>x</sup> for "path reversal", and - : x <sup>=</sup> y <sup>→</sup> y <sup>=</sup> z <sup>→</sup> x <sup>=</sup> z for "path concatenation" [34, Lemmas 2.2.1, 2.1.1, 2.1.2].

### **2 Sorts**

Single inductive (and quotient inductive) sets are simply elements of hSet. Inductive families [17] indexed over some fixed type A are families A <sup>→</sup> hSet. For the inductive-inductive definitions we are considering, the situation is more complicated, since we allow very general dependency structures. Our only requirement is that there is no looping dependency, since this is easily seen to lead to contradictions, e.g. we do not allow the definition of a family A : B <sup>→</sup> hSet mutually with a family B : A <sup>→</sup> hSet (whatever this would mean). Concretely, we will ensure that the collection of type formation rules (the type signatures) is given in a valid order, and we refer to the types used as family indices as the *sorts* of the definition. Hence our first step towards a specification of general QIITs is to explain what a valid specification of the sorts is.

Sorts do not only determine the formation rules of the inductive definitions, but also the types of the eliminators. To capture this, it is not enough to specify a type of sorts — in order to take the shape of the elimination rules into account, we need to specify a category.

**Definition 3 (Sort specifications).** *A specification of the* sorts *of a quotient inductive-inductive definition of* n *types is given by a list*

$$H\_0, H\_1, \dots, H\_{n-1},$$

*where each* <sup>H</sup>*<sup>i</sup> is a functor* <sup>H</sup>*<sup>i</sup>* : <sup>C</sup>*<sup>i</sup>* <sup>⇒</sup> hSet*. Here,* <sup>C</sup><sup>0</sup> :<sup>≡</sup> <sup>1</sup> *is the terminal category, and* C*<sup>i</sup>*+1 *is defined as follows:*


*We say that* <sup>C</sup>*<sup>n</sup> is the* base category *for the sort signature* <sup>H</sup><sup>0</sup>,...,H*<sup>n</sup>*−<sup>1</sup>*.*

The following examples will hopefully make clear the connection between the specification in Definition 3 and common classes of data types.

*Example 4 (Permutable trees).* For a single inductive type such as the type of trees <sup>T</sup>(A) in Example 1, the sorts are specified by a single functor <sup>H</sup><sup>0</sup> : <sup>C</sup><sup>0</sup> <sup>⇒</sup> hSet which maps the single object of <sup>C</sup><sup>0</sup> to the unit type **<sup>1</sup>**. Objects in the base category <sup>C</sup><sup>1</sup> are thus pairs (, W), where <sup>W</sup> : **<sup>1</sup>** <sup>→</sup> hSet, and morphisms are given by f : <sup>→</sup> in <sup>1</sup> (necessarily the identity morphism), together with a dependent function g : (x : **<sup>1</sup>**) <sup>→</sup> W(x) <sup>→</sup> V (x). It is easy to see that this category <sup>C</sup><sup>1</sup> is equivalent to the category hSet.

*Example 5 (The finite types).* Consider the inductive family Fin : <sup>N</sup> <sup>→</sup> hSet of finite types. Again, this is a single type family, i.e. we are in the case n <sup>≡</sup> 1. We have <sup>H</sup><sup>0</sup>() :<sup>≡</sup> <sup>N</sup>, and the base category <sup>C</sup><sup>1</sup> is equivalent to the category of <sup>N</sup>-indexed families, where objects are families X : <sup>N</sup> <sup>→</sup> hSet and morphisms <sup>C</sup>1(X, Y ) are dependent functions f : (n : <sup>N</sup>) <sup>→</sup> X(n) <sup>→</sup> Y (n).

*Example 6 (Contexts and types).* Let us consider the QIIT (Con,Ty) from Example 2. Here, we need two functors <sup>H</sup><sup>0</sup>, H<sup>1</sup>, the first corresponding to Con and the second to Ty. The first is given by H0() :<sup>≡</sup> **<sup>1</sup>** as in Example 4, since Con is a type on its own. Next, we need <sup>H</sup><sup>1</sup> : <sup>C</sup><sup>1</sup> <sup>⇒</sup> hSet. Applying the equivalence between <sup>C</sup><sup>1</sup> and hSet established in Example 4, we define <sup>H</sup><sup>1</sup> to be the identity functor <sup>H</sup><sup>1</sup>(A) :<sup>≡</sup> <sup>A</sup>, since then Ty : <sup>H</sup><sup>1</sup>(Con) <sup>→</sup> hSet. The base category <sup>C</sup><sup>2</sup> is equivalent to the category Fam(hSet), whose objects are pairs (A, B) where A : hSet and B : A <sup>→</sup> hSet, and whose morphisms (A, B) to (A , B ) consist of functions f : A <sup>→</sup> A together with dependent functions <sup>g</sup> : (<sup>x</sup> : <sup>A</sup>) <sup>→</sup> <sup>B</sup>(x) <sup>→</sup> <sup>B</sup> (f x).

*Example 7 (the Cauchy reals).* Recall that the Cauchy reals in the HoTT book [34] are constructed by simultaneously defining <sup>R</sup> : hSet and <sup>∼</sup>: <sup>R</sup> <sup>×</sup> <sup>R</sup> <sup>→</sup> hSet (we ignore the fact that [34] uses <sup>U</sup> instead of hSet). This time the sorts <sup>H</sup>0, H<sup>1</sup> are given by <sup>H</sup><sup>0</sup>() :<sup>≡</sup> **<sup>1</sup>** and <sup>H</sup><sup>1</sup>(A) :<sup>≡</sup> <sup>A</sup> <sup>×</sup> <sup>A</sup>, corresponding to the fact that <sup>∼</sup> is a binary relation on <sup>R</sup>. The base category has (up to equivalence) pairs (X, Y ) with Y : X <sup>×</sup> X <sup>→</sup> hSet as objects, and morphisms are defined accordingly.

*Example 8 (The full syntax of type theory).* Altenkirch and Kaposi [1] give the complete syntax of a basic type theory as a (at that point unspecified) QIIT. Although this construction is far too involved to be treated as an example in the rest of this paper (where we prefer to work with the simplified version of Example 2), we can give the sort signature <sup>H</sup>0, H1, H2, H<sup>3</sup> of this QIIT. Apart from contexts Con and types Ty, this definition also involves context morphisms Tms and terms Tm:


We have:


*Remark 9.* Although we work in type theory also in the meta-theory, we give the presentation informally in natural language. Formally, the specification of sorts and base categories of Definition 3 can be defined as an inductive-recursive definition [19] of the list <sup>H</sup><sup>0</sup>,...,H*<sup>n</sup>* simultaneously with a function that turns such a list into a category. Details can be found in Dijkstra's thesis [16, Sect. 4.3].

The main result of this section states that base categories of sort signatures are complete, i.e. have all small limits. By a small limit, we mean a limit of a diagram D : I→C, where the shape category <sup>I</sup> has a set of objects, and the collection of morphisms between any two objects is a set. This result will be needed later to show that categories of QIIT algebras are complete. Recall that hSet has all small limits by a standard construction.

**Theorem 10 (Base categories are complete).** *For any sort signature* H<sup>0</sup>, ..., H*n*−<sup>1</sup>*, the corresponding base category* <sup>C</sup>*<sup>n</sup> has all small limits.*

*Proof.* All proofs can be found in the arXiv version of the paper [3].

### **3 Algebras**

Once the sorts of an inductive definition have been established, the next step is to specify the *constructors*. In this section, we will give a very general definition of constructor specifications, although we will mainly focus on two specific kinds: *point constructors*, which can be thought of as the operations of an algebraic signature, and *path constructors*, which correspond to the axioms.

Similarly to how sorts are specified inductively in Sect. 2, we construct suitable categories of algebras by starting with a finitely complete category C such as the one obtained from a sort signature, specify a constructor on C, and then extend C using this constructor specification to get a new finitely complete category C . This process is repeated until all constructors have been added, and we obtain the sought-after inductive type as the underlying set of an initial object of the category at the last stage, provided this initial object exists. In the case of the inductive definition of natural numbers, this process will turn out as follows:


#### **3.1 Relative Continuity and Constructor Specifications**

Roughly speaking, constructors at each stage are given by pairs of hSet-valued functors F and G on <sup>C</sup>, where G is continuous (i.e. preserves all small limits). The intuition is that F specifies the arguments of the constructor, while G determines its target. For instance, in the example of the natural numbers when specifying the constructor suc : <sup>N</sup> <sup>→</sup> <sup>N</sup>, <sup>C</sup> is the category of pointed sets, and both F and G are the forgetful functor to hSet. The continuity condition on G is needed for the corresponding category of algebras to be complete. Intuitively, this expresses that a constructor should only "construct" elements of one of the sorts, or equalities thereof.<sup>1</sup> In particular, a constant functor is usually not a valid choice for G.

Unfortunately, this simple description falls short of capturing many of the examples of QIITs mentioned in Sect. 1. The problem is that we want G to be able to depend on the elements of F. However, since F is assumed to be an arbitrary functor, its category of elements is not necessarily complete, and so we need to refine the notion of G being continuous to this case.

**Definition 11 (Relative continuity).** *Let* <sup>C</sup> *be a category,* <sup>C</sup><sup>0</sup> *a complete category, and* <sup>U</sup> : C⇒C<sup>0</sup> *a functor. If* <sup>I</sup> *is a small category, and* <sup>X</sup> : <sup>I</sup> → C *is a diagram, we say that a cone* A <sup>→</sup> X *in* <sup>C</sup> *is a* U*-*limit cone*, or* limit cone relative to U*, if the induced cone* UA <sup>→</sup> UX *is a limit cone in* <sup>C</sup>0*. A functor* C ⇒ hSet *is* continuous relative to U *if it maps* U*-limit cones to limit cones in* hSet*.*

In the special case <sup>C</sup><sup>0</sup> <sup>≡</sup> hSet, the functor <sup>U</sup> in Definition <sup>11</sup> is continuous relative to itself. Also note that if <sup>C</sup> is complete and U creates limits, then relative continuity with respect to U reduces to ordinary continuity. If <sup>C</sup> is a complete category, and F : C ⇒ hSet is an arbitrary functor, the category - C F of elements of F is equipped with a forgetful functor into <sup>C</sup>. We will implicitly consider relative limit cones and relative continuity with respect to this forgetful functor, unless specified otherwise. Note that if <sup>C</sup> is complete and F is continuous, then - C F is also complete, and relative continuity of functors on - C F is the same as continuity, as observed above.

We can now give a precise definition of what is needed to specify a constructor:

**Definition 12 (Constructor specifications).** *A* constructor specification *on a complete category* C *is given by:*


Given a constructor specification, we can define the corresponding category of algebras. In Theorem 25, we will see that the assumptions of Definition 12 guarantee that this category is complete.

**Definition 13 (Category of algebras).** *Let* (F, G) *be a constructor specification on a complete category* <sup>C</sup>*. The* category of algebras *of* (F, G) *is denoted* <sup>C</sup>.(F, G)*, and is defined as follows:*

<sup>1</sup> More concretely, elements of a sort correspond to representable functors for algebras over a single generator for that sort, while equalities correspond to algebras with no generators and the given equality as the only relation. Clearly, representable functors are continuous, and the converse holds for reasonable functors (e.g. accessible ones). However, we do not attempt to make this construction precise here, and the following results do not depend on it.


$$
\psi(F(f)\,x) = G(\overline{f})(\theta\,x),
$$

*where* f : (X, x) <sup>→</sup> (Y,F(f) x) *is the morphism in* - C F *determined by* f*.*

We think of <sup>C</sup>.(F, G) as a category of "dependent dialgebras" [20]. Note that there is an obvious forgetful functor <sup>C</sup>.(F, G) → C.

Similarly to how we defined sort specifications (Definition 3), we now have all the necessary notions in place to be able to give the full definition of a QIIT.

#### **Definition 14 (QIIT descriptions).** *A* QIIT description *is given by*


For Definition 14 to make sense, the categories B*<sup>i</sup>* need to be complete, since constructor specifications are only defined on complete categories. This will follow from Theorem 25.

*Example 15 (Permutable trees).* The constructor leaf : T(A) from Example <sup>1</sup> can be specified by functors <sup>F</sup><sup>0</sup> : hSet <sup>⇒</sup> hSet and <sup>G</sup><sup>0</sup> : hSet<sup>F</sup><sup>0</sup> <sup>⇒</sup> hSet, where <sup>F</sup><sup>0</sup>(X) :<sup>≡</sup> **<sup>1</sup>** and <sup>G</sup><sup>0</sup>(X,l) :<sup>≡</sup> <sup>X</sup>. Note how <sup>F</sup><sup>0</sup> specifies the (trivial) arguments of leaf, and <sup>G</sup><sup>0</sup> the target. Next the constructor node : (<sup>A</sup> <sup>→</sup> <sup>T</sup>(A)) <sup>→</sup> <sup>T</sup>(A) can be specified by functors <sup>F</sup><sup>1</sup> : hSet• <sup>⇒</sup> hSet and <sup>G</sup><sup>1</sup> : hSet• <sup>F</sup><sup>1</sup> <sup>⇒</sup> hSet, where hSet• is the category of pointed sets (we think of the point as the previous constructor leaf): <sup>F</sup><sup>1</sup> and <sup>G</sup><sup>1</sup> are defined as <sup>F</sup><sup>1</sup>(X,l) :<sup>≡</sup> <sup>A</sup> <sup>→</sup> <sup>X</sup> and <sup>G</sup><sup>1</sup>(X,l, f) :<sup>≡</sup> <sup>X</sup>, so that

$$\mathsf{node} : (f : F\_1(T(A), \mathsf{leaf})) \to G\_1(T(A), \mathsf{leaf}, f).$$

Theorem <sup>18</sup> will show that <sup>G</sup><sup>0</sup> and <sup>G</sup><sup>1</sup> are relatively continuous.

The corresponding category of algebras for this constructor specification (F<sup>1</sup>, G<sup>1</sup>) for node is equivalent to the category whose objects are triples (X,l, n) where X : hSet, l : A, and n : (A <sup>→</sup> X) <sup>→</sup> X. After specifying also the mixconstructor, the new category of algebras further contains a dependent function <sup>p</sup> : (<sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>X</sup>) <sup>→</sup> (<sup>e</sup> : <sup>X</sup> <sup>∼</sup><sup>=</sup> <sup>X</sup>) <sup>→</sup> <sup>n</sup>(f) = <sup>n</sup>(<sup>f</sup> ◦ <sup>e</sup>).

*Example 16 (Contexts and types).* The constructor <sup>σ</sup>eq of type

$$((\Gamma : \mathsf{Con})(A : \mathsf{Ty}(\Gamma))(B : \mathsf{Ty}(\mathsf{ext}\,\Gamma \, A)) \to \mathsf{ext}\,(\mathsf{ext}\,\Gamma \, A)\, B =\_{\mathsf{Con}} \mathsf{ext}\,\Gamma\,(\sigma \,\Gamma \, A \, B))$$

from Example <sup>2</sup> is specified in the context of the previous constructors ε, ext and σ by functors F : C ⇒ hSet and G : - C F <sup>⇒</sup> hSet, where <sup>C</sup> is the category of algebras of the previous constructors, with

$$F(C, T, \varepsilon, \mathbf{ext}, \sigma) := \Sigma(\varGamma : C) . \Sigma(A : T(\varGamma)) . T(\mathbf{ext}, \varGamma \, A)$$

and

$$G(C, T, \varepsilon, \texttt{ext}, \sigma, \varGamma, A, B) : \equiv \texttt{ext}\,(\texttt{ext}\,\varGamma\,A)\,B =\_C \texttt{ext}\,\varGamma\,(\sigma\,\varGamma\,A\,B).$$

Theorem <sup>23</sup> will show that G is relatively continuous. The corresponding category of algebras for this constructor specification has objects tuples (C, T, e, c, b, s, seq) where (C, T, e, c, b, s) is an algebra for the previous constructors, and

<sup>s</sup>eq : (<sup>Γ</sup> : <sup>C</sup>) <sup>→</sup> (<sup>A</sup> : <sup>T</sup>(Γ)) <sup>→</sup> (<sup>B</sup> : <sup>T</sup>(cΓ A)) <sup>→</sup> <sup>c</sup> (cΓ A) <sup>B</sup> <sup>=</sup>*<sup>C</sup>* c Γ (sΓ AB).

#### **3.2 Point Constructors**

If C is the base category for a sort signature as in Definition 3, we can define specific target functors C ⇒ hSet which are guaranteed to be relatively continuous. Constructors having those as targets are referred to as *point constructors*. Intuitively, a point constructor is an operation that returns an element (point) of one of the sorts. The corresponding target functor is the forgetful functor that projects out the chosen sort. However, sorts can be dependent, so such a projection needs to be defined on a category of elements.

Specifically, let <sup>C</sup> be a finitely complete category, H : C ⇒ hSet a functor, and <sup>C</sup> the extended base category with one more sort indexed over <sup>H</sup>. Recall from Definition <sup>13</sup> that the objects of <sup>C</sup> are pairs (X, P), where <sup>X</sup> is an object of <sup>C</sup>, and <sup>P</sup> is a family of sets indexed over HX. Let <sup>V</sup>*<sup>H</sup>* : <sup>C</sup> ⇒ C be the forgetful functor. We define the *base target* functor corresponding to H to be the functor <sup>U</sup>*<sup>H</sup>* : - C- (H ◦ V*<sup>H</sup>*) <sup>⇒</sup> hSet given by

$$U\_H(X, P, x) = P(x).$$

In other words, given an object X of <sup>C</sup>, a family P over HX, and a point x in the base, the functor <sup>U</sup>*<sup>H</sup>* returns the fibre of the family <sup>P</sup> over <sup>x</sup>. The action of <sup>U</sup>*<sup>H</sup>* on morphisms is the obvious one.

*Example 17 (Permutable trees).* In Example 15, the functor <sup>G</sup><sup>0</sup> : hSet<sup>F</sup><sup>0</sup> <sup>⇒</sup> hSet specifying the target of leaf is the composition of the forgetful hSet<sup>F</sup><sup>0</sup> <sup>⇒</sup> hSet with the base target functor for the only sort, in this case the identity id : hSet <sup>⇒</sup> hSet.

Note that <sup>U</sup>*<sup>H</sup>* <sup>=</sup> id in Example <sup>17</sup> is relatively continuous, as required by Definition 12. In the rest of this section, we will show that this is true in general. Given a category <sup>C</sup> and a functor F : C ⇒ hSet, it is well known that the slice category over F of the functor category C ⇒ hSet is equivalent to the functor category - C F <sup>⇒</sup> hSet (see for example [24, Proposition 1.1.7]). Given a functor G : C ⇒ hSet and a natural transformation α : G <sup>→</sup> F, we will refer to the functor <sup>G</sup> : - C F <sup>⇒</sup> hSet corresponding to α as the *functor of fibres* of α. Concretely, <sup>G</sup> maps an object (X, x), where <sup>x</sup> : F X, to the fibre of <sup>α</sup>*<sup>X</sup>* over <sup>x</sup>. The following theorem is proved by noting that <sup>U</sup>*<sup>H</sup>* is a functor of fibres.

**Theorem 18 (Base target functors are relatively continuous).** *Let* <sup>C</sup> *be a complete category,* H : C ⇒ hSet *any functor, and* <sup>C</sup> *the extended base category corresponding to* <sup>H</sup>*. Then the base target functor* <sup>U</sup>*<sup>H</sup> is relatively continuous.*

#### **3.3 Reindexing Target Functors**

In many cases, we can obtain suitable target functors by composing the desired base target functor with the forgetful functor to the appropriate stage of the base category. When building constructors one at a time, it will follow from Theorems 25 and 10 applied to the previous steps that this forgetful functor is continuous, and the relative continuity of the target functor will follow. In more complicated examples, composing with a forgetful functor is not quite enough. We often want to "substitute into" or reindex a target functor to target a specific element. For example, in the context of Example 2, consider a hypothetical modified σ constructor of the form

$$\sigma' : \left(\Sigma(\varGamma : \mathsf{Con}).\Sigma(A : \mathsf{Ty}(\varGamma)).\mathsf{Ty}(\mathsf{ext}\varGamma A)\right) \to \mathsf{Ty}(\mathsf{ext}\varGamma A).$$

We want the target functor to return the set Ty(ext Γ A), and not just Ty(x) for a new argument x, which is the result of the base target functor. We can obtain the desired target functor as a composition

$$
\int ^{\mathcal{C}}F \xrightarrow{S} \int ^{\mathsf{Fam(h\textbf{Set})}} \pi\_1 \xrightarrow{U\_H} \mathsf{hSet},\tag{1}
$$

where <sup>C</sup> is the category with objects tuples (C, T, ε, ext), F : C ⇒ hSet is the functor giving the arguments of the constructor σ , <sup>U</sup>*<sup>H</sup>* is the base target functor corresponding to the second sort, and S is the functor defined by S(C, T, ε, ext, Γ, A, B) :<sup>≡</sup> (C, T, ext Γ A).

Since the functors S that we compose with in order to "substitute" are of a special form, the resulting functor will still be relatively continuous when starting with a relatively continuous functor. This is made precise by the following result:

**Lemma 19 (Preservation of relative limit cones).** *Suppose given is a commutative diagram of categories and functors as shown on the right, where* C<sup>0</sup> *and* D<sup>0</sup> *are complete, and* G *maps* U*-limit cones to* V *-limit cones. Then* F *maps* (U ◦ U )*-limit cones to* (V ◦ V )*-limit cones. In particular, if* <sup>C</sup> *and* <sup>D</sup> *are complete and* G *is continuous, then* F *preserves relative limit cones.*

*Example 20.* Starting from the situation in (1) we can form the diagram shown on the left, where V : C ⇒ Fam(hSet) is the forgetful functor and hence continuous. It follows from the second statement of Lemma <sup>19</sup> that S preserves relative limit cones, hence <sup>G</sup> <sup>=</sup> <sup>U</sup>*<sup>H</sup>* ◦ <sup>S</sup> is relatively continuous by Theorem 18.

#### **3.4 Path Constructors**

Path constructors are constructors where the target functor G returns an *equality* type. They can e.g. be used to express laws when constructing an initial algebra of an algebraic theory as a QIT. We saw an example of this in Example 1, where we had a path constructor of the form

$$\mathsf{mi} \colon (f : A \to T) \to (e : A \cong A) \to \mathsf{node}(f) = \mathsf{node}(f \circ e).$$

The argument functor for mix is entirely unproblematic. However, it is perhaps not so clear that the target functor, which sends (X, l, n, f, e) to the equality type <sup>n</sup>(f) =*<sup>X</sup>* <sup>n</sup>(<sup>f</sup> ◦ <sup>e</sup>), is relatively continuous. The aim of the current section is to show this for any functor of this form. We first observe that the prototypical such equality functor is relatively continuous, and then show that any other target functor for a path constructor can be obtained by substitution using Lemma 19.

**Definition 21.** *Let* Eq : hSet(id <sup>×</sup> id) <sup>⇒</sup> hSet *be the functor defined on objects by* Eq(X, x, y) :<sup>≡</sup> <sup>x</sup> <sup>=</sup> *<sup>X</sup>* <sup>y</sup> *and on morphisms by* Eq(f, p*x*, p*<sup>y</sup>*) :<sup>≡</sup> <sup>p</sup>*<sup>x</sup>* - (ap f <sup>−</sup>) - p−1 *y .*

It is not hard to see that Eq is a functor. Furthermore, Eq is the functor of fibres of the obvious diagonal natural transformation Δ : id <sup>→</sup> id <sup>×</sup> id.

### **Lemma 22.** *The standard equality functor* Eq *is relatively continuous.*

The lemma we have just given is central to the observation that a large class of equality functors are suitable targets for constructors:

**Theorem 23 (Equality functors are relatively continuous).** *Let* <sup>C</sup> *be a complete category,* F : C ⇒ hSet *any functor, and* G : - C F <sup>⇒</sup> hSet *a relatively continuous functor. Suppose given two global elements* l, r *of* G*, i.e. natural transformations* l, r : **<sup>1</sup>** <sup>→</sup> G*. The map*

$$\mathsf{Eq}\_G(l,r) : \int^{\mathcal{C}} F \to \mathsf{hSet}$$

*with* Eq*G*(l, r)(<sup>Y</sup> )=(l*<sup>Y</sup>* <sup>=</sup>*<sup>G</sup>*(*<sup>Y</sup>* ) <sup>r</sup>*<sup>Y</sup>* ) *extends to a relatively continuous functor.*

*Example 24 (Permutable trees).* The target of the mix constructor from Example <sup>1</sup> can be obtained as an equality functor in this sense. We take G to be the underlying sort, which is relatively continuous by the results of the previous section. The global elements <sup>l</sup> and <sup>r</sup> are defined by <sup>l</sup>(*X,l,n,f,e*) :<sup>≡</sup> <sup>n</sup>(f) and <sup>r</sup>(*X,l,n,f,e*) :<sup>≡</sup> <sup>n</sup>(<sup>f</sup> ◦ <sup>e</sup>). Their naturality can easily be verified directly.

Iterating equality functors, one can also express *higher* path constructors, but in our limited setting of inductively defined *sets*, there is little reason to go beyond one level of path constructors — higher ones will have no effect on the resulting inductive type. However, we believe that the ease with which Theorem 23 can be applied iteratively will be an important feature when generalising our technique to general higher inductive types. We discuss this further in Sect. 5.

#### **3.5 Categories of Algebras are Complete**

Recall from Definition <sup>13</sup> that the category of algebras <sup>C</sup>.(F, G) for a constructor specification (F, G) on a complete category <sup>C</sup> has "dependent (F, G)-dialgebras" as objects, and maps that commute with the dialgebra structure as morphisms. In this section, we will show that <sup>C</sup>.(F, G) is complete, and that its forgetful functor is continuous. The significance of this result is twofold: First of all, it enables the use of limits when reasoning about algebras; in particular, we will show in Sect. 4 how, using products and equalisers, one can extend the classical equivalence between initiality and induction for ordinary inductive types to our setting. Secondly, it goes a long way towards establishing existence of initial algebras; since a category of algebras over n + 1 constructors is complete, and the forgetful functor to the category of algebras over the first n preserves limits, the adjoint functor theorem says that this functor has a left adjoint if and only if it satisfies the solution set condition. Applying this argument at every stage, we get a left adjoint for the forgetful functor down to hSet, and in particular an initial object. There is no reason to expect the solution set condition to hold at this generality, but we expect it to follow from appropriate "accessibility" conditions on the argument functors. This is discussed further in Sect. 5.

**Theorem 25 (Categories of algebras are complete).** *Let* (F, G) *be a constructor specification on a complete category* <sup>C</sup>*. Then* <sup>C</sup>.(F, G) *is complete.*

### **4 Elimination Principles**

So far, we have given rules for specifying a QIIT by giving a sort signature and a list of constructors. As type-theoretical rules, these correspond to the formation and introduction rules for the QIIT. In this section, we introduce the corresponding elimination rules, stating that a QIIT is the smallest type closed under its constructors. We show that a categorical formulation of the elimination rules is equivalent to the universal property of initiality.

#### **4.1 The Section Induction Principle**

The elimination principle for an algebra X states that *every fibred algebra over* X *has a section*, where a fibred algebra over X is an algebra family "Q : X <sup>→</sup> hSet", and a section of it a dependent algebra morphism "(x : X) <sup>→</sup> Q(x)".<sup>2</sup> The usual correspondence between type families and fibrations extends to algebras, and so we formulate the elimination rule for X as X being section inductive in the category of algebras in the following sense:

**Definition 26 (Section inductive).** *An object* X *of a category* <sup>C</sup> *is* section inductive *if for every object* Y *of* <sup>C</sup> *and morphism* p : Y <sup>→</sup> X*, there exists* s : X <sup>→</sup> Y *such that* p ◦ s <sup>=</sup> id*X.*

<sup>2</sup> See Dijkstra's thesis [16, Sect. 5.4] for the general definition of fibred algebras and their morphisms — here we restrict ourselves to examples only for space reasons.

For an algebra X, the existence of the underlying function(s) X <sup>→</sup> Y corresponds to the elimination rules, while the fact that they are algebra morphisms corresponds to the computation rules.

*Example 27 (Permutable trees).* Consider permutable-tree algebras, e.g. tuples (X, l, n, p) as in Example 15. A fibred permutable-tree algebra over (X, l, n, p) consists of <sup>Q</sup> : <sup>X</sup> <sup>→</sup> hSet together with <sup>m</sup>*<sup>l</sup>* : <sup>Q</sup>(l) and

$$\begin{aligned} m\_n: \quad &(f: A \to X) \to (g: (a:A) \to Q(f\,a)) \to Q(n\,f) \\ m\_p: \quad &(f: A \to X) \to (g: (a:A) \to Q(f\,a)) \to (e:A \cong A) \\ &\to m\_n \, f \,g \,= [\mathtt{ap} \, Q \, p] \, m\_n \, (f \circ e) \, (g \circ e) \end{aligned}$$

Here the type x = [p] y is the types of equalities between elements x : A and y : B in different types, themselves related by an equality proof p : A <sup>=</sup> B. This data can be arranged into an ordinary algebra Σ(x : X).Q(x), together with an algebra morphism <sup>π</sup><sup>1</sup> : Σ(x : X).Q(x) <sup>→</sup> <sup>X</sup>. A section of <sup>π</sup><sup>1</sup> is a dependent function h : (x : X) <sup>→</sup> Q(x). Since h comes from an algebra morphism, we further know e.g. <sup>h</sup>(l) = <sup>m</sup>*<sup>l</sup>* and <sup>h</sup>(n f) = <sup>m</sup>*<sup>n</sup>* <sup>f</sup> (h◦f). Conversely, every algebra morphism g : (X , l , n , p ) <sup>→</sup> (X, l, n, p) gives rise to a fibred algebra (Q, m*l*, m*n*, m*<sup>p</sup>*) by considering the fibres Q(x) = Σ(y : A ).g(y) = x of <sup>p</sup>. The points <sup>m</sup>*<sup>l</sup>*, <sup>m</sup>*<sup>n</sup>* and the path <sup>m</sup>*<sup>p</sup>* arise from the proof that <sup>g</sup> preserves l , n and p .

*Example 28 (Contexts and types).* For context-and-types algebras from Example 16, a fibred algebra over (C, T, e, c, b, s, seq) consists of Q : C <sup>→</sup> hSet and <sup>R</sup> : (<sup>x</sup> : <sup>C</sup>) <sup>→</sup> <sup>T</sup>(x) <sup>→</sup> <sup>Q</sup>(x) <sup>→</sup> hSet, together with <sup>m</sup>*<sup>e</sup>* : <sup>Q</sup>(e) and

$$\begin{aligned} m\_c: \quad & (\Gamma : C) \to (x : Q(\Gamma)) \to (A : T(\Gamma)) \to R(\Gamma, A, x) \to Q(c \Gamma \, A) \\ m\_b: \quad & (\Gamma : C) \to (x : Q(\Gamma)) \to R(\Gamma, b \Gamma, x) \\ m\_s: \quad & (\Gamma : C) \to (x : Q(\Gamma)) \to (A : T(\Gamma)) \to (y : R(\Gamma, A, x) \to (B : T(c \Gamma \, A))) \\ & \to (z : R(c \Gamma \, A, B, m\_c \Gamma \, x \, A \, y)) \to R(\Gamma, s \Gamma \, A \, B, x) \\ m\_{s\_{\text{eq}}}: \quad & (\Gamma : C) \to (x : Q(\Gamma)) \to (A : T(\Gamma)) \to (y : R(\Gamma, A, x)) \\ & \to (B : T(c \Gamma \, A)) \to (z : R(c \Gamma \, A, B, m\_c \Gamma \, x \, A \, y)) \\ & \to m\_c \, (c \Gamma \, A) \, (m\_c \Gamma \, x \, A \, y) \, B \, z = [\mathsf{ap} \, Q \, (s\_{\text{eq}} \Gamma \, A \, B)] \\ & \qquad & m\_c \Gamma \, x \, (s \Gamma \, A \, B) \, (m\_s \Gamma \, x \, A \, y \, B \, z) \end{aligned}$$

Again, this data can be arranged into an ordinary algebra with base C : hSet, T : C <sup>→</sup> hSet, where C <sup>=</sup> Σ(x : C).Q(x) and T (x, q) = Σ(y : T(x)).R(x, y, q), together with an algebra morphism (π<sup>1</sup>, π<sup>1</sup>):(C , T ) <sup>→</sup> (C, T). A section of this morphism gives functions f : (x : C) <sup>→</sup> Q(x) and g : (x : C) <sup>→</sup> (y : T(x)) <sup>→</sup> R(x, y, f x) that preserve the algebra structure.

A general account of the equivalence between the usual formulation of the elimination rules and the section induction principle is in Dijkstra [16, Sect. 5.4].

#### **4.2 Initiality, and its Relation to the Section Induction Principle**

The section induction principle for an algebra X matches our intuitive understanding of the elimination rules for X quite well, but it is perhaps a priori not so clear that e.g. satisfying it defines an algebra uniquely up to equivalence. In this section, we show that this is the case by proving that the section induction principle is equivalent to the categorical property of initiality. Recall that a type is *contractible* if it is equivalent to the unit type [34, Definition 3.11.1].

**Definition 29 (Initiality).** *An object* X *of a category* <sup>C</sup> *is* (homotopy) initial *if for every object* Y *of* <sup>C</sup>*, the set of morphisms* X <sup>→</sup> Y *is contractible.*

It is easy to see that initiality implies section induction, while the converse requires additional structure on C:

**Lemma 30.** *If an object* X *in a category* <sup>C</sup> *is initial, then it is section inductive. If* <sup>C</sup> *has finite limits and* X *is section inductive, then* X *is initial.*

From here, we can show the main theorem of the current section. The proof uses the fact that both statements involved are mere propositions, i.e. they have at most one proof.

**Theorem 31 (Initiality** <sup>∼</sup><sup>=</sup> **section induction).** *An object* <sup>X</sup> *in a in a category of algebras* <sup>C</sup>.(F, G) *being initial is equivalent to it being section inductive.*

As an application, we can now reason about QIITs using their categories of algebras. For instance, we get a short proof of the following fact:

**Corollary 32.** *The interval is equivalent to the unit type.*

*Proof.* By Theorem 31, the interval is the initial object in the category with objects <sup>Σ</sup>(<sup>X</sup> : hSet).Σ(<sup>x</sup> : <sup>X</sup>).Σ(<sup>y</sup> : <sup>X</sup>).x <sup>=</sup>*<sup>X</sup>* <sup>y</sup>, while the unit type is the initial object in the category with objects Σ(X : hSet).X. By contractibility of singleton types [34, Lemma 3.11.8], the former is equivalent to the latter, and since initiality is a universal property, the two initial objects coincide up to equivalence.

### **5 Conclusions and Further Work**

We have developed a semantic framework for QIITs: A QIIT description gives rise to a category of algebras, and the initial object of this category represent the types and constructors of the QIIT. This generalises the usual functorial semantics of inductive types to a more general setting. So far we have verified the appropriateness of this setting by means of examples. In future work, we would like to explicitly relate the syntax of QIITs to the corresponding semantics.

Our categories of algebras are complete. This is helpful for the metatheory of QIITs, as demonstrated by the proof of initiality being equivalent to section induction (Theorem 31), justifying elimination principles. Of course, completeness is not by itself sufficient to derive the existence of initial algebras, but it suggests that it should be possible to restrict the argument functors to guarantee this, possibly by reducing QIITs to a basic type former playing an analogous role to that of W-types for inductive types. We believe that completeness of the categories of algebras allows an existence proof using the adjoint functor theorem.

We have restricted our attention to QIITs, but we believe that our construction is applicable to general HITs (and even HIITs). While at first glance such an extension of our framework seems to require an internal theory of (∞, 1) categories, we believe that it is enough to keep track of only a very limited number of coherence conditions, making this extension possible even without solving the well-known problem of specifying an infinite tower of coherences in HoTT.

Other possible future directions include the combination of QIITs and induction-recursion, and the possibility of generalising coinductive types along similar lines. These generalisations should be driven by examples, similar to how the examples discussed in the current paper have motivated the need for a theory of QIITs.

**Acknowledgements.** We thank Ambrus Kaposi and Jakob von Raumer for many interesting discussions, and the anonymous referees for their valuable comments. This research was supported by EPSRC grants EP/M016994/1 and EP/K023837/1, as well as AFOSR award FA9550-16-1-0029.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Category Theory and Quantum Control

### **Guarded Traced Categories**

Sergey Goncharov(B) and Lutz Schr¨oder(B)

Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany {Sergey.Goncharov,Lutz.Schroeder}@fau.de

**Abstract.** Notions of guardedness serve to delineate the admissibility of cycles, e.g. in recursion, corecursion, iteration, or tracing. We introduce an abstract notion of guardedness structure on a symmetric monoidal category, along with a corresponding notion of guarded traces, which are defined only if the cycles they induce are guarded. We relate structural guardedness, determined by propagating guardedness along the operations of the category, to geometric guardedness phrased in terms of a diagrammatic language. In our setup, the Cartesian case (recursion) and the co-Cartesian case (iteration) become completely dual, and we show that in these cases, guarded tracedness is equivalent to presence of a guarded Conway operator, in analogy to an observation on total traces by Hasegawa and Hyland. Moreover, we relate guarded traces to unguarded categorical uniform fixpoint operators in the style of Simpson and Plotkin. Finally, we show that partial traces based on Hilbert-Schmidt operators in the category of Hilbert spaces are an instance of guarded traces.

### **1 Introduction**

In models of computation, various notions of *guardedness* serve to control cyclic behaviour by allowing only guarded cycles, with the aim to ensure properties such as solvability of recursive equations or productivity. Typical examples are guarded process algebra specifications [6,29], coalgebraic guarded (co-)recursion [27,33], finite delay in online Turing machines [9], and productive definitions in intensional type theory [1,30], but also contractive maps in (ultra-)metric spaces [24].

A highly general model for unrestricted cyclic computations, on the other hand, are *traced monoidal categories* [22]; besides *recursion* and *iteration*, they cover further kinds of cyclic behaviour, e.g. in Girard's *Geometry of Interaction* [4,14] and quantum programming [3,34]. In the present paper we parametrize the framework of traced symmetric monoidal categories with a notion of guardedness, arriving at *(abstractly) guarded traced categories*, which effectively vary between two extreme cases: symmetric monoidal categories (nothing is guarded) and traced symmetric monoidal categories (everything is guarded). In terms of the standard diagrammatic language for traced monoidal categories, we decorate input and output gates of boxes to indicate guardedness; the diagram governing trace formation would then have the general form depicted in Fig. 1 – that is, we can only form traces connecting guarded (black) output gates to input gates that are unguarded (black), i.e. not assumed to be already guarded.

We provide basic structural results on our notion of abstract guardedness, and identify a wide array of examples. Specifically, we establish a geometric characterization of guardedness in terms of paths in diagrams; we identify a notion of *guarded ideal*, along with a construction of guardedness structures from guarded ideals

**Fig. 1.** Guarded trace

and simplifications of this construction for the (co-)Cartesian and the Cartesian closed case; and we describe 'vacuous' guardedness structures where traces do not actually generate proper diagrammatic cycles. In terms of examples, we begin with the case where the monoidal structure is either product (Cartesian), corresponding to guarded recursion, or coproduct (co-Cartesian), for guarded iteration; the axioms for guardedness allow for a basic duality that indeed makes these two cases precisely dual. For total traces in Cartesian categories, Hasegawa and Hyland observed that trace operators are in one-to-one correspondence with *Conway fixpoint operators* [18,19]; we extend this correspondence to the guarded case, showing that guarded trace operators on a Cartesian category are in one-toone correspondence with guarded Conway operators. In a more specific setting, we relate *guarded* traces in Cartesian categories to *unguarded* categorical uniform fixpoints as studied by Crole and Pitts [11] and by Simpson and Plotkin [37,38]. Concluding with a case where the monoidal structure is a proper tensor product, we show that the partial trace operation on (infinite-dimensional) Hilbert spaces is an instance of vacuous guardedness; this result relates to work by Abramsky, Blute, and Panangaden on traces over nuclear ideals, in this case over *Hilbert-Schmidt operators* [2].

**Related Work.** Abstract guardedness serves to determine definedness of a guarded trace operation, and thus relates to work on partial traces. We discuss work on nuclear ideals [2] in Sect. 6. In *partial traced categories* [17,26], traces are governed by a partial equational version (consisting of both strong and directed equations) of the Joyal-Street-Verity axioms; morphisms for which trace is defined are called *trace class*. A key difference to the approach via guardedness is that being trace class applies only to morphisms with inputs and outputs of matching types while guardedness applies to arbitrary morphisms, allowing for compositional propagation. Also, the axiomatizations are incomparable: Unlike for trace class morphisms [17, Remark 2.2], we require guardedness to be closed under composition with arbitrary morphisms (thus covering contractivity but not, e.g., monotonicity as in the modal μ-calculus); on the other hand, as noted by Jeffrey [21], guarded traces, e.g. of contractions, need not satisfy Vanishing II as a Kleene equality as assumed in partial traced categories. Some approaches treat traces as partial over objects [8,20]. In concrete algebraic categories, partial traces can be seen as induced by total traces in an ambient category of relations [5]. We discuss work on guardedness via endofunctors in Remark 23.

### **2 Preliminaries**

We recall requisite categorical notions; see [25] for a comprehensive introduction.

**Symmetric Monoidal Categories.** <sup>A</sup> *symmetric monoidal category* (**C**, <sup>⊗</sup>, I) consists of a category **<sup>C</sup>** (with object class <sup>|</sup>**C**|), a bifunctor <sup>⊗</sup> (*tensor product*), and a *(tensor) unit* I ∈ |**C**|, and coherent isomorphisms witnessing that <sup>⊗</sup> is, up to isomorphism, a commutative monoid structure with unit I. For the latter, we reserve the notation <sup>α</sup>A,B,C : (<sup>A</sup> <sup>⊗</sup> <sup>B</sup>) <sup>⊗</sup> <sup>C</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup> <sup>⊗</sup> (<sup>B</sup> <sup>⊗</sup> <sup>C</sup>) (*associator* ), <sup>γ</sup>A,B : <sup>A</sup> <sup>⊗</sup> <sup>B</sup> <sup>∼</sup><sup>=</sup> <sup>B</sup> <sup>⊗</sup> <sup>A</sup> (*symmetry*), and <sup>υ</sup><sup>A</sup> : <sup>I</sup> <sup>⊗</sup> <sup>A</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup> (*left unitor* ); the *right unitor* <sup>υ</sup>ˆ<sup>A</sup> : <sup>A</sup> <sup>⊗</sup> <sup>I</sup> <sup>∼</sup><sup>=</sup> <sup>A</sup> is expressible via the symmetry. A symmetric monoidal category is *Cartesian* if the monoidal structure is finite product (i.e. ⊗ = ×, and I = 1 is a terminal object), and, dually, *co-Cartesian* if the monoidal structure is finite coproduct (i.e. <sup>⊗</sup> = +, and I <sup>=</sup> <sup>∅</sup> is an initial object). Coproduct injections are written in<sup>i</sup> : <sup>X</sup><sup>i</sup> <sup>→</sup> <sup>X</sup><sup>1</sup> <sup>+</sup> <sup>X</sup><sup>2</sup> (<sup>i</sup> = 1, 2), and product projections pr<sup>i</sup> : <sup>X</sup><sup>1</sup> <sup>×</sup> <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>X</sup><sup>i</sup>. Various notions of algebraic tensor products also induce symmetric monoidal structures; see Sect. 6 for the case of Hilbert spaces. One has an obvious expression language for objects and morphisms in symmetric monoidal categories [36], the former obtained by postulating basic objects and closing under I and <sup>⊗</sup>, and the latter by postulating basic morphisms of given profile and closing under <sup>⊗</sup>, I, composition, identities, and the monoidal isomorphisms, subject to the evident notion of *well-typedness*. Morphism expressions are conveniently represented as *diagrams* consisting of boxes representing the basic morphisms, with input and output gates corresponding to the given profile. Tensoring is represented by putting boxes on top of each other, and composition by wires connecting outputs to inputs [36]. In a *traced symmetric monoidal category* one has an additional operation (*trace*) that essentially enables the formation of loops in diagrams, as in Fig. 1 (but without decorations).

**Monads and (Co-)algebras.** A(n F)-*coalgebra* for a functor F : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** is a pair (X, f : X <sup>→</sup> F X) where X ∈ |**C**|, thought of as modelling states and generalized transitions [33]. A *final coalgebra* is a final object in the category of coalgebras (with **<sup>C</sup>**-morphisms h : X <sup>→</sup> Y such that (F h)f <sup>=</sup> gh as morphisms (X, f) <sup>→</sup> (Y,g)), denoted (νF, out : νF <sup>→</sup> FνF) if it exists. Dually, an F*algebra* has the form (X, f : F X <sup>→</sup> <sup>X</sup>). A *monad* <sup>T</sup> = (T, μ, η) on a category **<sup>C</sup>** consists of an endofunctor T on **<sup>C</sup>** and natural transformations η : Id <sup>→</sup> T (*unit*) and μ : T<sup>2</sup> <sup>→</sup> <sup>T</sup> (*multiplication*) subject to standard equations [25]. As observed by Moggi [31], monads can be seen as capturing *computational effects* of programs, with T X read as a type of computations with side effects from <sup>T</sup> and results in <sup>X</sup>. In this view, the *Kleisli category* **<sup>C</sup>**<sup>T</sup> of <sup>T</sup>, which has the same objects as **<sup>C</sup>** and Hom**<sup>C</sup>**<sup>T</sup> (X, Y ) = Hom**C**(X, T Y ), is a category of sideeffecting programs. A monad is *strong* if it is equipped with a *strength*, i.e. a natural transformation X <sup>×</sup> T Y <sup>→</sup> T(X <sup>×</sup> Y ) satisfying evident coherence conditions (e.g. [31]). A T-algebra (A, a) is an *(Eilenberg-Moore)* <sup>T</sup>*-algebra* (for the *monad* <sup>T</sup>) if additionally aη <sup>=</sup> id and a(T a) = aμ<sup>A</sup>; the category of <sup>T</sup>algebras is denoted **C**<sup>T</sup>.

### **3 Guarded Categories**

We now introduce our notion of guarded structure. A standard example of guardedness are guarded definitions in process algebra. E.g. in the definition P <sup>=</sup> a.P, the right hand occurrence of P is guarded, ensuring unique solvability (by a process that keeps outputting a). A further example is contractivity of maps between complete metric spaces. We formulate abstract closure properties for *partial* guardedness where only some of the inputs and outputs of a morphism are guarded. Specifically, we distinguish *guarded outputs* and *guarded inputs* (D and B, respectively, in the following definition), with the intended reading that guarded outputs yield guarded data *provided* guarded data is already provided at guarded inputs, while unguarded inputs may be fed arbitrarily.

**Fig. 2.** Axioms of guarded categories

**Definition 1 (Guarded category).** An *(abstractly) guarded category* is a symmetric monoidal category (**C**, <sup>⊗</sup>, I) equipped with distinguished subsets Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) <sup>⊆</sup> Hom(A <sup>⊗</sup> B,C <sup>⊗</sup> D) of *partially guarded morphisms* for A, B, C, D ∈ |**C**|, satisfying the following conditions:

**(uni**⊗**)** <sup>γ</sup>I,A <sup>∈</sup> Hom•(<sup>I</sup> <sup>⊗</sup> A, A <sup>⊗</sup> <sup>I</sup>);

**(vac**⊗**)** f <sup>⊗</sup> g <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) for all f : A <sup>→</sup> C, g : B <sup>→</sup> D;

**(cmp**⊗**)** g <sup>∈</sup> Hom•(A <sup>⊗</sup> B,E <sup>⊗</sup> F) and f <sup>∈</sup> Hom•(E <sup>⊗</sup> F, C <sup>⊗</sup> D) imply f g <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D);

**(par**⊗**)** for f <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D), g <sup>∈</sup> Hom•(A <sup>⊗</sup> B , C <sup>⊗</sup> D ), the evident transpose of f <sup>⊗</sup> g is in Hom•((<sup>A</sup> <sup>⊗</sup> <sup>A</sup> ) <sup>⊗</sup> (B <sup>⊗</sup> B ),(C <sup>⊗</sup> C ) <sup>⊗</sup> (D <sup>⊗</sup> D )).

We emphasize that Hom•(<sup>A</sup> <sup>⊗</sup> B,C <sup>⊗</sup> <sup>D</sup>) is meant to depend individually on <sup>A</sup>, B, C, D and not just on A <sup>⊗</sup> B and C <sup>⊗</sup> D.

One easily derives a *weakening* rule stating that if <sup>f</sup> <sup>∈</sup> Hom•((A <sup>⊗</sup> A ) <sup>⊗</sup> B,C <sup>⊗</sup> (D <sup>⊗</sup>D)), then the obvious transpose of f is in Hom•(A⊗(A <sup>⊗</sup>B),(C⊗D

We extend the standard diagram language for symmet- *D* ric monoidal categories (Sect. 2), representing morphisms f <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) by *decorated boxes* as shown on the right,

)⊗D). *B A C*

with black bars marking the *unguarded input* gates A and the *guarded output* gates D. Weakening then corresponds to shrinking the black bars of decorated boxes. Figure 2 depicts the above axioms in this language. Solid boxes represent the assumptions, while dashed boxes represent the conclusions. The latter only occur in the derivation process and do not form part of the actual diagrams representing concrete morphisms. We silently identify object expressions and sets of gates in diagrams. Given a (well-typed) morphism expression e, a judgement e <sup>∈</sup> Hom•(<sup>A</sup> <sup>⊗</sup> B,C <sup>⊗</sup> <sup>D</sup>), called a *guardedness typing* of <sup>e</sup>, is *derivable* if it can be derived from the assumed guardedness typing of the constituent basic boxes of e using the rules in Definition 1. We have an obvious notion of (directed) *paths* in diagrams; a path is *guarded* if it passes some basic box f through an unguarded input gate and a guarded output gate (intuitively, guardedness is then introduced along the path as the passage through f will guarantee guarded output without assuming guarded input). We then have the following geometric characterization of guardedness typing:

**Theorem 2.** *For a well-typed morphism expression* e <sup>∈</sup> Hom(A <sup>⊗</sup> B,C <sup>⊗</sup> D)*, the guardedness typing* e <sup>∈</sup> Hom•(<sup>A</sup> <sup>⊗</sup> B,C <sup>⊗</sup> <sup>D</sup>) *is derivable iff in the diagram of* e*, every path from an input gate in* A *to an output gate in* D *is guarded.*

Every symmetric monoidal category has both a largest (Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) = Hom(A <sup>⊗</sup> B,C <sup>⊗</sup> D)) and a least guarded structure:

**Lemma and Definition 3 (Vacuous guardedness).** *Every symmetric monoidal category is guarded under taking* <sup>f</sup> <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) *iff* f *factors as*

$$A \otimes B \xrightarrow{\mathsf{id}\_A \otimes g} A \otimes E \otimes D \xrightarrow{h \otimes \mathsf{id}\_D} C \otimes D$$

*(eliding associativity) with* g : B <sup>→</sup> E <sup>⊗</sup> D*,* h : A <sup>⊗</sup> E <sup>→</sup> C*. This is the least guarded structure on* **C***, the* vacuous guarded structure*.*

E.g. the natural guarded structure on Hilbert spaces (Sect. 6) is vacuous.

**Remark 4 (Duality).** The rules and axioms in Fig. 2 are stable under 180◦ rotation, that is, under reversing arrows and applying the monoidal symmetry on both sides (this motivates decorating the *unguarded* inputs). Consequently, if **<sup>C</sup>** is guarded, then so is the dual category **<sup>C</sup>**op, with guardedness given by <sup>f</sup> <sup>∈</sup> Hom• **<sup>C</sup>**op (A⊗B,C <sup>⊗</sup> <sup>D</sup>) iff the obvious transpose of <sup>f</sup> is in Hom• **<sup>C</sup>**(D <sup>⊗</sup>C, B <sup>⊗</sup>A).

In case ⊗ is coproduct, we can simplify the description of partial guardedness:

**Proposition 5.** *Partial guardedness in a co-Cartesian category* (**C**, <sup>+</sup>, <sup>∅</sup>) *is equivalently determined by distinguished subsets* Homσ(X, Y ) <sup>⊆</sup> Hom(X, Y ) *with* <sup>σ</sup> *ranging over coproduct injections* <sup>Y</sup><sup>2</sup> <sup>→</sup> <sup>Y</sup><sup>1</sup> <sup>+</sup> <sup>Y</sup><sup>2</sup> <sup>∼</sup><sup>=</sup> <sup>Y</sup> *, subject to the rules on the right hand side of Fig. 3, where* <sup>f</sup> : <sup>X</sup> <sup>→</sup><sup>σ</sup> <sup>Y</sup> *denotes* <sup>f</sup> <sup>∈</sup> Homσ(X, Y )*, with* <sup>f</sup> <sup>∈</sup> Hom•(X<sup>1</sup> <sup>+</sup> <sup>X</sup><sup>2</sup>, Y<sup>1</sup> <sup>+</sup> <sup>Y</sup><sup>2</sup>) *iff* (fin1) <sup>∈</sup> Homin<sup>2</sup> (X<sup>1</sup>, Y<sup>1</sup> <sup>+</sup> <sup>Y</sup><sup>2</sup>)*.*

We have used the mentioned rules for →<sup>σ</sup> in previous work on guarded iteration [16] (with **(vac**×**)** called **(trv)**, and together with weakening, which as indicated above turns out to be derivable). By duality (Remark 4), we immediately have a corresponding description for the Cartesian case:

**Corollary 6.** *Partial guardedness in a Cartesian category* (**C**, <sup>×</sup>, 1) *is equivalently determined by distinguished subsets* Hom<sup>σ</sup>(X, Y ) <sup>⊆</sup> Hom(X, Y ) *with* <sup>σ</sup> *ranging over product projections* <sup>X</sup> <sup>∼</sup><sup>=</sup> <sup>X</sup><sup>1</sup> <sup>×</sup> <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>X</sup><sup>1</sup>*, subject to the rules on the left hand side of Fig. 3, where* f : X <sup>→</sup><sup>σ</sup> <sup>Y</sup> *denotes* <sup>f</sup> <sup>∈</sup> Hom<sup>σ</sup>(X, Y )*, with* <sup>f</sup> <sup>∈</sup> Hom•(X<sup>1</sup> <sup>×</sup> <sup>X</sup><sup>2</sup>, Y<sup>1</sup> <sup>×</sup> <sup>Y</sup><sup>2</sup>) *iff* pr<sup>2</sup><sup>f</sup> <sup>∈</sup> Hompr<sup>1</sup> (X<sup>1</sup> <sup>×</sup> <sup>X</sup><sup>2</sup>, Y<sup>2</sup>)*.*

$$\begin{array}{ll} \{\mathsf{vac}\_{\times}\} \ \frac{f:X\to Z}{f\ \mathsf{pr}\_{1}:X\times Y\ \to \mathsf{pr}\_{2}\ Z} & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \{\mathsf{vac}\_{+}\} \ \frac{f:X\to Z}{\mathsf{inv}\_{1}:f:X\to\mathsf{inv}\_{2}\ Z+Y} \\\\ \qquad\qquad\qquad f:X\times Y\to\mathsf{pr}\_{2}\ Z & \qquad \qquad \qquad \qquad f:X\to\mathsf{inv}\_{\mathsf{inv}\_{2}} Y+Z \\\\ \{\mathsf{emp}\_{\times}\} \ \frac{g:V\to^{\sigma}X \qquad h:V\to Y}{f\langle g,h\rangle:V\to^{\sigma}Z} & \qquad \qquad \qquad \qquad \{\mathsf{emp}\_{+}\} \ \frac{g:Y\to\_{\sigma}V \qquad h:Z\to V}{[g,h]f:X\to\_{\sigma}V} \\\\ \{\mathsf{par}\_{\times}\} \ \frac{f:X\to^{\sigma}Y \qquad g:X\to^{\sigma}Z}{\langle f,g\rangle:X\to^{\sigma}Y\times Z} & \qquad \qquad \qquad \qquad \{\mathsf{par}\_{+}\} \ \frac{f:X\to\_{\sigma}Z \qquad f:Y\to\_{\sigma}Z}{[f,g]:X+Y\to\_{\sigma}Z} \end{array}$$

**Fig. 3.** Axioms of Cartesian (left) and co-Cartesian (right) guarded categories

**Remark 7.** In a co-Cartesian category, vacuous guardedness (Lemma 3) can equivalently be described by f <sup>∈</sup> Hom•(<sup>A</sup> <sup>+</sup> B,C <sup>+</sup> <sup>D</sup>) iff <sup>f</sup> decomposes as <sup>f</sup> = [in1h, g] (uniquely provided that in<sup>1</sup> is monic), or in terms of the description from Proposition 5, <sup>u</sup> <sup>∈</sup> Homin<sup>2</sup> (X, Y <sup>+</sup> <sup>Z</sup>) iff <sup>u</sup> factors through in1. Of course, the dual situation obtains in Cartesian categories.

**Example 8 (Process algebra).** Fix a monad <sup>T</sup> on (**C**, <sup>+</sup>, <sup>∅</sup>) and an endofunctor Σ : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** such that the generalized coalgebraic resumption transform <sup>T</sup><sup>Σ</sup> <sup>=</sup> νγ. T(<sup>−</sup> <sup>+</sup> Σγ) exists; we think of <sup>T</sup><sup>Σ</sup><sup>X</sup> as a type of processes that have side-effects in <sup>T</sup> and perform communication actions from Σ, seen as a generalized signature. The Kleisli category **C**<sup>T</sup><sup>Σ</sup> of T<sup>Σ</sup> is again co-Cartesian. Putting

$$f: X \to\_{\mathsf{i}\mathsf{i}\_{2}} T\_{\Sigma}(Y + Z) \iff \mathsf{out} f \in \{T(\mathsf{i}\mathsf{i}\_{1} + \mathsf{i}\mathsf{d})g \mid g: X \to T(Y + \Sigma T \Sigma(Y + Z))\}$$

(cf. Sect. 2 for notation), we make **C**<sup>T</sup><sup>Σ</sup> into a guarded category [16]. The standard motivating example of finitely nondeterministic processes is obtained by taking <sup>T</sup> <sup>=</sup> <sup>P</sup><sup>ω</sup> (finite powerset monad) and <sup>Σ</sup> <sup>=</sup> <sup>A</sup> × − (action prefixing).

**Example 9 (Metric spaces).** Let **C** be the Cartesian category of metric spaces and non-expansive maps. Taking f : X <sup>×</sup> Y <sup>→</sup>pr<sup>2</sup> <sup>Z</sup> iff λy. f(x, y) is contractive for every x <sup>∈</sup> X makes **<sup>C</sup>** into a guarded Cartesian category.

### **4 Guardedness via Guarded Ideals**

Most of the time, the structure of a guarded category is determined by morphisms with only unguarded inputs and guarded outputs, which form an *ideal*:

**Definition 10 (Guarded morphisms).** A morphism f : X <sup>→</sup> Y in a guarded category is *guarded* (as opposed to only partially guarded) if υ<sup>−</sup><sup>1</sup> <sup>Y</sup> <sup>f</sup> <sup>υ</sup>ˆ<sup>X</sup> <sup>∈</sup> Hom•(X <sup>⊗</sup> I, I <sup>⊗</sup> Y ); we write Hom-(X, Y ) for the set of guarded morphisms f : X <sup>→</sup> Y .

**Definition 11 (Guarded ideal).** A family G of subsets G(X, Y ) <sup>⊆</sup> Hom(X, Y ) (X, Y ∈ |**C**|) in a monoidal category (**C**, <sup>⊗</sup>, I) is a *guarded ideal* if it is closed under <sup>⊗</sup> and under composition with arbitrary **<sup>C</sup>**-morphisms on both sides, and G(I,I) = Hom(I,I).

There is always a *least guarded ideal*, G(X, Y ) = {g f <sup>|</sup> f : X <sup>→</sup> I,g : I <sup>→</sup> Y }. Moreover, as indicated above:

**Lemma and Definition 12.** *In a guarded category, the sets* Hom-(X, Y ) *form a guarded ideal, the guarded ideal* induced *by the guarded structure.*

Conversely, it is clear that every guarded ideal *generates* a guarded structure by just closing under the rules of Definition 1.

**Definition 13 (Ideally guarded category).** A guarded category is *ideal* or *ideally guarded* (over G) if it is generated by some guarded ideal (G).

We give a more concrete description:

**Theorem 14.** *Let* (**C**, <sup>⊗</sup>, I) *be ideally guarded over* <sup>G</sup>*. Then* Hom•(A⊗B,C⊗D) *consists of the morphisms of the form*

The transitions between guarded ideals and guarded structures are not in general mutually inverse: The guarded structure generated the guarded ideal induced by a guarded structure may be smaller than the original one (Example 21), and the guarded ideal induced by the guarded structure generated by a guarded ideal G may be larger than G (Remark 16). We proceed to analyse details.

**Proposition 15.** *On every symmetric monoidal category, the least guarded structure (Lemma 3) is ideal.*

**Remark 16.** Vacuously guarded categories need not induce the least guarded ideal (although by the next results, this does hold in the Cartesian and the co-Cartesian case). In fact, by Lemma 3, the guarded ideal induced by the vacuous guarded structure consists of the morphisms of the form (<sup>h</sup> <sup>⊗</sup> idD)(id<sup>A</sup> <sup>⊗</sup> <sup>g</sup>) (eliding associativity and the unitor) where g : I <sup>→</sup> E <sup>⊗</sup> D, h : A <sup>⊗</sup> E <sup>→</sup> I:

This ideal will resurface in the discussion of Hilbert spaces (Sect. 6).

The situation is simpler in the Cartesian and, dually, in the co-Cartesian case.

**Lemma 17.** *Let* **<sup>C</sup>** *be ideally guarded over* G*, and suppose that every* f <sup>∈</sup> G(X <sup>⊗</sup> Y,Z) *factors through* <sup>ˆ</sup>f <sup>⊗</sup> id : X <sup>⊗</sup> Y <sup>→</sup> V <sup>⊗</sup> Y *for some* <sup>ˆ</sup>f <sup>∈</sup> G(X, V )*. Then the guardedness structure of* **<sup>C</sup>** *induces* G*.*

If <sup>⊗</sup> = +, the premise of the lemma is automatic, since f <sup>∈</sup> G(X <sup>+</sup> Y,Z) can be represented as [<sup>f</sup> in<sup>1</sup>, f in2]=[id, f in2] (<sup>f</sup> in<sup>1</sup> <sup>+</sup> id) where <sup>f</sup> in<sup>1</sup> <sup>∈</sup> <sup>G</sup>(X, Z) by the closure properties of guarded ideals. Hence, we obtain

**Theorem 18.** *The guarded structure generated by a guarded ideal* G *on a co-Cartesian category is equivalently described by* Homin<sup>2</sup> (X, Y <sup>+</sup> <sup>Z</sup>) = {[in<sup>1</sup>, g]<sup>h</sup> <sup>|</sup> g <sup>∈</sup> G(W, Y <sup>+</sup> Z), h : X <sup>→</sup> Y <sup>+</sup> W}*, and hence induces* G*.*

**Corollary 19.** *The guarded structure generated by a guarded ideal* G *on a Cartesian category is equivalently described by* Hompr<sup>1</sup> (<sup>X</sup> <sup>×</sup> Y,Z) = {hg, pr2<sup>|</sup> g <sup>∈</sup> G(X <sup>×</sup> Y,W), h : W <sup>×</sup> Y <sup>→</sup> Z}*, and hence induces* G*.*

The description can be further simplified in the Cartesian closed case.

**Corollary 20.** *Given a guarded ideal* G *on a Cartesian closed category, put* <sup>f</sup> : <sup>X</sup> <sup>×</sup> <sup>Y</sup> <sup>→</sup>pr<sup>1</sup> Z *iff* curry f <sup>∈</sup> G(X, Z<sup>Y</sup> )*. This describes the guarded structure induced by* G *iff* G *is* exponential*, i.e.* f <sup>∈</sup> G(X, Y ) *implies* f<sup>V</sup> <sup>∈</sup> <sup>G</sup>(X<sup>V</sup> , Y <sup>V</sup> )*.*

(We leave it as an open question whether a similar characterization holds in the monoidal closed case.) Natural examples of both ideal and non-ideal guardedness are found in metric spaces:

**Example 21 (Metric spaces).** The guarded structure on metric spaces from Example 9 fails to be ideal: It induces the guarded ideal of contractive maps, which however generates the (ideal) guarded structure described by f : X <sup>×</sup> Y <sup>→</sup>pr<sup>2</sup> <sup>Z</sup> iff <sup>f</sup>(x, y) is *uniformly* contractive in <sup>y</sup>, i.e. there is c < 1 such that for every x, λy. f(x, y) is contractive with contraction factor c.

A large class of ideally guarded structures arises as follows.

**Proposition 22.** *Let* **C** *be a Cartesian category equipped with an endofunctor* - : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** *and a natural transformation* next : Id <sup>→</sup> -*. Then the following definition yields a guarded ideal in* **<sup>C</sup>***:* G(X, Y ) = {f next <sup>|</sup> f : -X <sup>→</sup> Y }*. The arising guarded structure is* Hompr<sup>1</sup> (X <sup>×</sup> Y,Z) = {fnext, pr2<sup>|</sup> f : -(X <sup>×</sup> Y ) <sup>×</sup> Y <sup>→</sup> Z}*. If moreover* next : X <sup>×</sup> Y <sup>→</sup> -(X <sup>×</sup> Y ) *factors through* next <sup>×</sup> id : X <sup>×</sup> Y <sup>→</sup> -X <sup>×</sup> Y *, then* Hompr<sup>1</sup> (<sup>X</sup> <sup>×</sup> Y,Z) = {<sup>f</sup> (next <sup>×</sup> id) <sup>|</sup> <sup>f</sup> : -X <sup>×</sup> Y <sup>→</sup> Z}*.*

**Remark 23.** Proposition 22 connects our approach to previous work based precisely on the assumptions of the proposition [28] (in fact, the term guarded traced category is already used there, with different meaning). A limitation of the approach via a functor arises from the need to fix globally, so that, e.g., the ideal guarded structure on metric spaces (Example 21) is not covered – capturing contractivity via requires fixing a single global contraction factor.

The following instance of Proposition 22 has received extensive recent interest in programming semantics:

**Example 24 (Topos of Trees).** Let **C** be the *topos of trees* [7], i.e. the presheaf category **Set**ωop where ω is the preorder of natural numbers (starting from 1) ordered by inclusion. An object X of **<sup>C</sup>** is thus a family (X(n))<sup>n</sup>=1,2...

**Fig. 4.** Axioms of guarded traced categories

of sets with restriction maps <sup>r</sup><sup>n</sup> : <sup>X</sup>(<sup>n</sup> + 1) <sup>→</sup> <sup>X</sup>(n). The *later* -endofunctor - : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** is defined by -X(1) = {} and -X(n + 1) = X(n), and the natural transformation next<sup>X</sup> : <sup>X</sup> <sup>→</sup> -X by nextX(1) = ! : X(1) → {}, nextX(<sup>n</sup> + 1) = <sup>r</sup><sup>n</sup>+1 : <sup>X</sup>(<sup>n</sup> + 1) <sup>→</sup> <sup>X</sup>(n). Guarded morphisms according to Proposition 22 are called *contractive*, generalizing the metric setup. Contractive morphisms form an exponential ideal, so partial guardedness is described as in Corollary 20, and hence agrees with contractivity in part of the input as in [7, Definition 2.2].

### **5 Guarded Traces**

As indicated previously, the main purpose of our notion of abstract guardedness is to enable fine-grained control over the formation of feedback loops, viz, *traces*.

**Definition 25 (Guarded traced category).** We call a guarded category (**C**, <sup>⊗</sup>, I) *guarded traced* if it is equipped with a *guarded trace operator*

$$\text{tr}\_{A,B,C,D}^U: \mathsf{Hom}^\bullet((A \otimes U) \otimes B, C \otimes (D \otimes U)) \to \mathsf{Hom}^\bullet(A \otimes B, C \otimes D),$$

visually corresponding to the diagram formation rule in Fig. 1, so that the adaptation of the Joyal-Street-Verity axiomatization of traced symmetric monoidal categories [22] shown in Fig. 4 is satisfied.

**Remark 26.** The versions of the sliding axiom in Fig. 4 differ in the way the loop is guarded. They are in line with duality (Remark 4): Sliding II arises from Sliding I by 180◦ rotation, and Sliding III is symmetric under 180◦ rotation.

We proceed to investigate the geometric properties of guarded traced categories, partly extending Theorem 2. The syntactic setting extends the one for guarded categories by additionally closing morphism expressions under the trace operator (interpreted diagrammatically as in Fig. 1), obtaining *traced morphism expressions*. Term formation thus becomes mutually recursive with guardedness typing: if e is a traced morphism expression such that e <sup>∈</sup> Hom•((<sup>A</sup> <sup>⊗</sup> <sup>U</sup>) <sup>⊗</sup> B,C <sup>⊗</sup> (D <sup>⊗</sup> U)) is derivable, then trA,B,C,D(e) is a traced morphism expression, and trA,B,C,D(e) <sup>∈</sup> Hom•(A <sup>⊗</sup> B,C <sup>⊗</sup> D) is derivable. *Traced diagrams* consists of finitely many (decorated) basic boxes and wires connecting output gates of basic boxes to input gates, with each gate attached to at most one wire; open gates are regarded as inputs or outputs, respectively, of the whole diagram. Of course, acyclicity is not required. We first note that the easy direction of Theorem 2 adapts straightforwardly to the setting with traces:

**Proposition 27.** *Let* e *be a traced morphism expression such that* e <sup>∈</sup> Hom•(A<sup>⊗</sup> B,C <sup>⊗</sup> D) *is derivable. Then in the diagram of* e*, all loops and all paths from input gates in* A *to output gates in* D *are guarded (p. 4).*

Remarkably, the converse of Proposition 27 in general fails in several ways:

**Example 28.** The left diagram below

(2)

shows that guardedness typing is not closed under equality of traced morphism expressions: Write e for the expression inducing the dashed box. By Proposition 27, e, and hence tr(e), fail to type as indicated. However, tr(e) = gf, for which the overall guardedness typing indicated is easily derivable.

Moreover, the diagram on the right above satisfies the necessary condition from Proposition 27 but is not induced by an expression for which the indicated guardedness typing is derivable, essentially because both ways of cutting the loop violate the necessary condition from Proposition 27.

However, if **<sup>C</sup>** is ideally guarded over a guarded ideal G, we do have a converse to Proposition 27: By Theorem 14, we can then restrict basic boxes in diagrams to be either *guarded*, i.e. have only black gates, or *unguarded*, i.e. have only white gates. We call the correspondingly restricted diagrams *ideally guarded*. (We emphasize that the guardedness typing of *composite* ideally guarded diagrams still needs to mix guarded and unguarded inputs and outputs.) A path in an ideally guarded diagram is guarded iff it passes through a guarded basic box.

The left-hand diagram in (2) is in fact ideally guarded, so guardedness typing fails to be closed under equality also in the ideally guarded case. However, for ideally guarded diagrams we have the following converse of Proposition 27.

**Theorem 29.** *Let* Δ *be an ideally guarded diagram, with sets of input and output gates disjointly decomposed as* A ∪· B *and* C ∪· D*, respectively. If every loop in* Δ *and every path from a gate in* A *to a gate in* D *is guarded, then* Δ *is induced by a traced morphism expression* e *such that* e <sup>∈</sup> Hom•(<sup>A</sup> <sup>⊗</sup> B,C <sup>⊗</sup> <sup>D</sup>) *is derivable.*

We next take a look at the Cartesian and co-Cartesian cases. Recall that by Proposition 5, the definition of guarded category can be simplified if ⊗ = + (and dually if ⊗ = ×). This simplification extends to guarded traced categories by generalizing Hyland-Hasegawa's equivalence between Cartesian trace operators and Conway fixpoint operators [18,19].

**Definition 30 (Guarded Conway operators).** Let **C** be a guarded co-Cartesian category. We call an operator (−)† of profile

$$f \in \mathsf{Hom}\_{\sigma + \mathsf{id}}(X, Y + X) \mapsto f^\dagger \in \mathsf{Hom}\_{\sigma}(X, Y) \tag{3}$$

a *guarded iteration operator* if it satisfies

– *fixpoint:* <sup>f</sup> † = [id, f †]<sup>f</sup> for <sup>f</sup> : <sup>X</sup> <sup>→</sup>in<sup>2</sup> <sup>Y</sup> <sup>+</sup> <sup>X</sup>;

and a *Conway iteration operator* if it additionally satisfies


Furthermore, we distinguish the following principles:


and call (−)† *squarable* or *uniform* if it satisfies squaring or uniformity, respectively.

*Guarded (Conway) recursion operators* (−)† on guarded Cartesian categories are defined dually in a straightforward manner. We collect the following facts about guarded iteration operators for further reference.

**Lemma 31.** *Let* (−)† *be a guarded iteration operator on* (**C**, <sup>+</sup>, <sup>∅</sup>)*.*


**Proposition 32.** *A guarded co-Cartesian category* **C** *is traced iff it is equipped with a guarded Conway iteration operator* (−)†*, with mutual conversions like in the total case [18,19].*

**Example 33 (Guarded Conway operators).** We list some examples of guarded Conway iteration/recursion operators. In all cases except 2, Conwayness follows from uniqueness of fixpoints [16, Theorem 17].


**Guarded vs. Unguarded Recursion.** We proceed to present a class of examples relating guarded and unguarded recursion. For motivation, consider the category (**Cpo**, <sup>×</sup>, 1) of complete partial orders (cpos) and continuous maps. This category nearly supports recursion via least fixpoints, except that, e.g., id : X <sup>→</sup> X only has a least fixpoint if X has a bottom. The following equivalent approaches involve the *lifting monad* (−)⊥, which adjoins a fresh bottom ⊥ to a given X ∈ |**Cpo**|.

*Classical approach* [38,39]: Define a total recursion operator (−)‡ on the category **Cpo**<sup>⊥</sup> of *pointed cpos* and continuous maps, using least fixpoints.

*Guarded approach* (cf. [28]): Extend **Cpo** to a guarded category: f : X <sup>×</sup> <sup>Y</sup> <sup>→</sup>pr<sup>2</sup> <sup>Z</sup> iff <sup>f</sup> ∈ {<sup>g</sup> (id <sup>×</sup> <sup>η</sup>) <sup>|</sup> <sup>g</sup> : <sup>X</sup> <sup>×</sup> <sup>Y</sup><sup>⊥</sup> <sup>→</sup> <sup>Z</sup>} (see Proposition 22), and define a guarded recursion operator sending f <sup>=</sup> g (id <sup>×</sup> η) : Y <sup>×</sup> X <sup>→</sup>pr<sup>2</sup> <sup>X</sup> to <sup>f</sup>† <sup>=</sup> <sup>g</sup> id, <sup>ˆ</sup><sup>f</sup> : <sup>Y</sup> <sup>→</sup> <sup>X</sup> with <sup>ˆ</sup>f(y) <sup>∈</sup> <sup>X</sup><sup>⊥</sup> calculated as the least fixpoint of λz. ηg(y, z).

Pointed cpos happen to be always of the form <sup>X</sup><sup>⊥</sup> with <sup>X</sup> ∈ |**Cpo**|, which indicates that (−)‡ is a special case of (−)†. This is no longer true in more general cases when the connection between (−)‡ and (−)† is more intricate. We show that (−)‡ and (−)† are nevertheless equivalent under reasonable assumptions.

**Definition 34 (**[11]**).** <sup>A</sup> *let-ccc with a fixpoint object* is a tuple (**C**,T,Ω,ω), consisting of a Cartesian closed category **C**, a strong monad T on it, an initial T-algebra (Ω, in) and an equalizer ω : 1 <sup>→</sup> Ω of in η : Ω <sup>→</sup> Ω and id : Ω <sup>→</sup> Ω.

The key requirement is the last one, satisfied, e.g., for **Cpo** and the lifting monad. Given a monad T on **C**, **C**<sup>T</sup> denotes the category of T-algebras and **C**-morphisms (instead of T-algebra homomorphisms).

**Proposition 35 (**[37, Theorem 4.6]**).** *Let* (**C**,T,Ω,ω) *be a let-ccc with a fixpoint object. Then* **C**<sup>T</sup> *has a unique* **<sup>C</sup>**T*-uniform recursion operator* (−)‡*.*

By [38, Theorem 4], the operator (−)‡ in Proposition 35 is Conway, in particular, by Lemma 31, squarable, if **C** has a natural numbers object and T is an *equational lifting monad* [10], such as (−)⊥. There are however further squarable operators obtained via Proposition 35, e.g. for the partial state monad T X = (X <sup>×</sup> S)<sup>S</sup> <sup>⊥</sup> [11]. By Lemma 31, the following result applies in particular in the setup of Proposition 35 under the additional assumption of squarability.

**Theorem 36.** *Let* T *be a strong monad on a Cartesian category* **C***. The following gives a bijective correspondence between squarable dinatural recursive operators* (−)‡ *on* **<sup>C</sup>**<sup>T</sup> *and squarable dinatural guarded recursive operators* (−)† *on* **<sup>C</sup>** *ideally guarded over* Hom-(X, Y ) = {f η <sup>|</sup> f : T X <sup>→</sup> Y }*:*

$$(f:B\times A\to A)\_{\sharp} = a\left(\eta f(\text{id}\times a)\right)\_{\sharp} \qquad \qquad \text{for } (A,a)\in |\mathbf{C}\_{\star}^{\mathrm{T}}|\tag{4}$$

$$g'(f = g(\text{id} \times \eta) : Y \times X \to X)\_{\uparrow} = g \langle \text{id}, (\eta g)\_{\downarrow} \rangle \tag{5}$$

*(in* (5) *we call on a slight extension of* (−)‡*; the right hand side of* (4) *is defined because* ηf(id <sup>×</sup> <sup>a</sup>) *factors as* ηf(id <sup>×</sup> <sup>a</sup>(T a)η)*). Moreover,* (−)† *is Conway iff so is* (−)‡*.*

### **6 Vacuous Guardedness and Nuclear Ideals**

We proceed to discuss traces in vacuously guarded categories (Lemma 3), and show that the partial trace operation in the category of (possibly infinitedimensional) Hilbert spaces [2] in fact lives over the vacuous guarded structure. We first note that vacuous guarded structures are traced as soon as a simple rewiring operation satisfies a suitable well-definedness condition (similar to one defining traced nuclear ideals [2, Definition 8.14]):

**Proposition 37.** *Let* (**C**, <sup>⊗</sup>, I) *be vacuously guarded. If for* f <sup>∈</sup> Hom•(<sup>A</sup> <sup>⊗</sup> B, <sup>C</sup> <sup>⊗</sup> <sup>D</sup>) *with factorization* <sup>f</sup> = (<sup>h</sup> <sup>⊗</sup> id<sup>D</sup>⊗<sup>U</sup> )(id<sup>A</sup>⊗<sup>U</sup> <sup>⊗</sup> <sup>g</sup>) *(eliding associativity),* g : B <sup>→</sup> E <sup>⊗</sup> D <sup>⊗</sup> U*,* h : A <sup>⊗</sup> U <sup>⊗</sup> E <sup>→</sup> C *as per Lemma 3, the composite*

$$A \otimes B \xrightarrow{\text{id}\_A \otimes g} A \otimes E \otimes D \otimes U \cong A \otimes U \otimes E \otimes D \xrightarrow{h \otimes \text{id}\_D} C \otimes D \tag{6}$$

*depends only on* f*, then* **<sup>C</sup>** *is guarded traced, with* tr<sup>U</sup> A,B,C,D(f) *defined as* (6)*.*

Diagrammatically, the trace in a vacuously guarded category is thus given by

We proceed to instantiate the above to Hilbert spaces. On a more abstract level, a *dagger symmetric monoidal category* [35] (or *tensored* ∗*-category* [2]) is a symmetric monoidal category (**C**, <sup>⊗</sup>, I) equipped with an identity-on-objects strictly involutive functor (−)† : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>**op coherently preserving the symmetric monoidal structure. The main motivation for dagger symmetric monoidal categories is to capture categories that are similar to (dagger) compact closed categories in that they admit a canonical trace construction for certain morphisms, but fail to be closed, much less compact closed. The "compact closed part" of a dagger symmetric monoidal category is axiomatized as follows.

**Definition 38 (Nuclear Ideal,** [2]**).** A *nuclear ideal* N in a dagger symmetric monoidal category (**C**, <sup>⊗</sup>,I,(−)†) is a family of subsets <sup>N</sup>(X, Y ) <sup>⊆</sup> Hom**C**(X, Y ), X, Y ∈ |**C**|, satisfying the following conditions:


$$\begin{array}{c} A \xrightleftharpoons A \otimes I \xrightleftharpoons A \otimes (B^{\dagger} \otimes C) \\\\ C \xrightleftharpoons \begin{array}{c} I \otimes C \xrightleftharpoons A \otimes (B^{\dagger} \otimes C) \\\\ \langle \theta(f) \rangle^{\dagger} \otimes \mathsf{id}\_{C} \end{array} \begin{array}{c} \begin{array}{c} A \otimes (B^{\dagger} \otimes C) \\\\ (B^{\dagger} \otimes A) \otimes C \end{array} \end{array}$$

The above definition is slightly simplified in that we elide a covariant involutive functor (−) : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>**, capturing, e.g. complex conjugation; i.e., we essentially restrict to spaces over the reals.

We proceed to present a representative example of a nuclear ideal in the category of Hilbert spaces. Recall that a *Hilbert space* [23] H over the field **<sup>R</sup>** of reals is a vector space with an *inner product* −, <sup>−</sup> : H <sup>×</sup> H <sup>→</sup> **<sup>R</sup>** that is complete as a *normed space* under the induced *norm* x <sup>=</sup> x, x . Let **Hilb** be the category of Hilbert spaces and bounded linear operators.

Clearly, **<sup>R</sup>** itself is a Hilbert space; linear operators X <sup>→</sup> **<sup>R</sup>** are conventionally called *functionals*. More generally, we consider *(multi-)linear* functionals <sup>X</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>X</sup><sup>n</sup> <sup>→</sup> **<sup>R</sup>**, i.e. maps that are linear in every argument. Such a functional is *bounded* if <sup>|</sup>f(x<sup>1</sup>,...,x<sup>n</sup>)<sup>|</sup> <sup>c</sup>x<sup>1</sup>···x<sup>n</sup> for some constant <sup>c</sup> <sup>∈</sup> **<sup>R</sup>**. We can move between bounded linear operators and bounded linear functionals, similarly as we can move between relations and functions to the Booleans:

**Proposition 39 (**[23, Theorem 2.4.1]**).** *Given a bounded linear operator* f : <sup>X</sup> <sup>→</sup> <sup>Y</sup> *,* <sup>f</sup> ◦(x, y) = fx, y *defines a bounded linear functional* f ◦*, and every bounded linear functional* X <sup>×</sup> Y <sup>→</sup> **<sup>R</sup>** *arises in this way.*

**Definition 40 (Hilbert-Schmidt operators/functionals).** A bounded linear functional <sup>f</sup> : <sup>X</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>X</sup><sup>n</sup> <sup>→</sup> **<sup>R</sup>** is *Hilbert-Schmidt* if the sum

$$\sum\_{x\_1 \in B\_1} \dots \sum\_{x\_n \in B\_n} (f(x\_1, \dots, x\_n))^2$$

is finite for some, and then any, orthonormal bases <sup>B</sup><sup>1</sup>,...,B<sup>n</sup> of <sup>X</sup><sup>1</sup>,...,Xn, respectively. A bounded linear operator f : X <sup>→</sup> Y is *Hilbert-Schmidt* if the induced functional f ◦ (Proposition 39) is Hilbert-Schmidt, equivalently if <sup>x</sup>∈<sup>B</sup> fx<sup>2</sup> is finite for some, and then any, orthonormal basis <sup>B</sup> of <sup>X</sup>. We denote by HS(X, Y ) the space of all Hilbert-Schmidt operators from X to Y .

For X, Y ∈ |**Hilb**|, the space of Hilbert-Schmidt functionals X <sup>×</sup> Y <sup>→</sup> **<sup>R</sup>** is itself a Hilbert space, denoted X <sup>⊗</sup> Y , with the pointwise vector space structure and the inner product f,g <sup>=</sup> x∈B y∈B <sup>f</sup>(x, y)g(x, y) where <sup>B</sup> and <sup>B</sup> are orthonormal bases of X and Y , respectively. By virtue of the equivalence between f and f ◦, this induces a Hilbert space structure on HS(X, Y ), with induced norm f<sup>2</sup> <sup>=</sup> - <sup>x</sup>∈<sup>B</sup> fx<sup>2</sup>. The operator <sup>⊗</sup> forms part of a dagger symmetric monoidal structure on **Hilb**, with unit **<sup>R</sup>**. For a bounded linear operator f : X <sup>→</sup> Y , f † : <sup>Y</sup> <sup>→</sup> <sup>X</sup> is the *adjoint operator* uniquely determined by equation x, f †<sup>y</sup> <sup>=</sup> fx, y . The tensor product of <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> and <sup>g</sup> : <sup>C</sup> <sup>→</sup> <sup>D</sup> is the functional sending h : A <sup>×</sup> C <sup>→</sup> **<sup>R</sup>** to h(f † <sup>×</sup> <sup>g</sup>†) : <sup>B</sup> <sup>×</sup> <sup>D</sup> <sup>→</sup> **<sup>R</sup>**. Given <sup>a</sup> <sup>∈</sup> <sup>A</sup> and c <sup>∈</sup> C, let us denote by a <sup>⊗</sup> c <sup>∈</sup> A <sup>⊗</sup> C the functional (a , c ) → a, a c, c , and so, with the above f and g, (f <sup>⊗</sup> g)(a <sup>⊗</sup> c) = f(a) <sup>⊗</sup> g(c).

**Proposition 41 (**[2]**).** *The Hilbert-Schmidt operators form a nuclear ideal in* **Hilb** *with* <sup>θ</sup> : HS(X, Y ) <sup>∼</sup><sup>=</sup> Hom(**R**, X† <sup>⊗</sup> <sup>Y</sup> ) *defined by*

$$
\theta(f:X\to Y)(r:\mathbf{R})(x:X,y:Y) = r\langle fx, y\rangle.
$$

A crucial fact underlying the proof of Proposition <sup>41</sup> is that HS(X, Y ) is isomorphic to <sup>X</sup>†⊗Y , naturally in X and Y . We emphasize that what makes the case of **Hilb** significant is that we do not restrict to finite-dimensional Hilbert spaces. In that case all bounded linear operators would be Hilbert-Schmidt and the corresponding category would be (dagger) compact closed [35]. In the infinitedimensional case, identities need not be Hilbert-Schmidt, so HS is indeed only an ideal and not a subcategory.

Let <sup>N</sup><sup>2</sup>(X, Y ) = {g†<sup>h</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> <sup>|</sup> <sup>h</sup> <sup>∈</sup> <sup>N</sup>(X, Z), g <sup>∈</sup> <sup>N</sup>(Y,Z)} for any nuclear ideal N. The main theorem of the section now can be stated as follows.

**Theorem 42.** *1. The guarded ideal induced by the vacuous guarded structure on* **Hilb** *(see (1)) is precisely* HS<sup>2</sup>*, and* **Hilb** *is guarded traced over* HS<sup>2</sup>*.*

*2. Guarded traces in* **Hilb** *commute with* (−)† *in the sense that if* <sup>f</sup> <sup>∈</sup> Hom•((A<sup>⊗</sup> <sup>U</sup>)⊗B,C⊗(D⊗U))*, then* <sup>γ</sup>B,A⊗<sup>U</sup> <sup>f</sup> †γ<sup>D</sup>⊗U,C <sup>∈</sup> Hom•((D⊗U)⊗C, B⊗(A⊗U)) *and* tr<sup>U</sup> D,C,B,A(γB,A⊗<sup>U</sup> <sup>f</sup> †γ<sup>D</sup>⊗U,C ) = <sup>γ</sup>A,B (tr<sup>U</sup> A,B,C,D(f))† γC,D*.*

Clause 1 is a generalization of the result in [2, Theorem 8.16] to parametrized traces. Specifically, we obtain agreement with the conventional mathematical definition of trace: given <sup>f</sup> <sup>∈</sup> HS<sup>2</sup>(X, X), tr(f) = <sup>i</sup>f(e<sup>i</sup>), e<sup>i</sup> for any choice of an orthonormal basis (e<sup>i</sup>)i, and HS<sup>2</sup>(X, X) contains precisely those f for which this sum is absolutely convergent independently of the basis.

### **7 Conclusions and Further Work**

We have presented and investigated a notion of abstract *guardedness* and guarded *traces*, focusing on foundational results and important classes of examples. We have distinguished a more specific notion of *ideal guardedness*, which in many respects appears to be better behaved than the unrestricted one, in particular ensures closer agreement between structural and geometric guardedness. An unexpectedly prominent role is played by 'vacuous' guardedness, characterized by the absence of paths connecting unguarded inputs to guarded outputs; e.g., partial traces in Hilbert spaces [2] turn out to be based on this form of guardedness. Further research will concern a coherence theorem for guarded traced categories generalizing the well-known unguarded case [22,34], and a generalization of the Int-construction [22], which would relate guarded traced categories to a suitable guarded version of compact closed categories. Also, we plan to investigate guarded traced categories as a basis for generalized Hoare logics, extending and unifying previous work [5,15].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Proper Semirings and Proper Convex Functors**

Ana Sokolova1(B) and Harald Woracek<sup>2</sup>

<sup>1</sup> University of Salzburg, Salzburg, Austria ana.sokolova@cs.uni-salzburg.at <sup>2</sup> TU Vienna, Vienna, Austria harald.woracek@tuwien.ac.at

**Abstract.** Esik and Maletti introduced the notion of a proper semiring and proved that some important (classes of) semirings – Noetherian semirings, natural numbers – are proper. Properness matters as the equivalence problem for weighted automata over a semiring which is proper and finitely and effectively presented is decidable. Milius generalised the notion of properness from a semiring to a functor. As a consequence, a semiring is proper if and only if its associated "cubic functor" is proper. Moreover, properness of a functor renders soundness and completeness proofs for axiomatizations of equivalent behaviour.

In this paper we provide a method for proving properness of functors, and instantiate it to cover both the known cases and several novel ones: (1) properness of the semirings of positive rationals and positive reals, via properness of the corresponding cubic functors; and (2) properness of two functors on (positive) convex algebras. The latter functors are important for axiomatizing trace equivalence of probabilistic transition systems. Our proofs rely on results that stretch all the way back to Hilbert and Minkowski.

**Keywords:** Proper semirings · Proper functors · Coalgebra Weighted automata · Probabilistic transition systems

### **1 Introduction**

In this paper we deal with algebraic categories and deterministic weighted automata functors on them. Such categories are the target of generalized determinization [10,22,23] and enable coalgebraic modelling beyond sets. For example, non-deterministic automata, weighted, or probabilistic ones are coalgebraically modelled over the categories of join-semilattices, semimodules for a semiring, and convex sets, respectively. Moreover, expressions for axiomatizing behavior semantics often live in algebraic categories.

In order to prove completeness of such axiomatizations, the common approach [4,21,23] is to prove finality of a certain object in a category of coalgebras over an algebraic category. Proofs are significantly simplified if it suffices to verify finality only w.r.t. coalgebras carried by free finitely generated algebras, as those are the coalgebras that result from generalized determinization.

In recent work, Milius [16] proposed the notion of a proper functor on an algebraic category that provides a sufficient condition for this purpose. This notion is an extension of the notion of a proper semiring introduced by Esik and Maletti [8]: A semiring is proper if and only if its "cubic" functor is proper. A cubic functor is a functor <sup>S</sup> <sup>×</sup> (−)<sup>A</sup> where <sup>A</sup> is a finite alphabet and <sup>S</sup> is a free algebra with a single generator in the algebraic category. Cubic functors model deterministic weighted automata which are models of determinizations of non-deterministic and probabilistic transition systems.

Properness is the property that for any two states that are behaviourally equivalent in coalgebras with free finitely generated carriers, there is a zig-zag of homomorphisms (called a chain of simulations in the original works on weighted automata and proper semirings) that identifies the two states and whose nodes are all carried by free finitely generated algebras.

Even though the notion of properness is relatively new for a semiring and very new for a functor, results on properness of semirings can be found in more distant literature as well. Here is a brief history, to the best of our knowledge:


Having properness of a semiring, together with the property of the semiring being finitely and effectively presentable, yields decidability of the equivalence problem (decidability of trace equivalence) for weighted automata.

In this paper, motivated by the wish to prove properness of a certain functor F on convex algebras used for axiomatizing trace semantics of probabilistic systems in [23], as well as by the open questions stated in [16, Example 3.19], we provide a framework for proving properness. We instantiate this framework on known cases like Noetherian semirings and N (with a zig-zag that is a span), and further prove new results of properness:


– The functor <sup>F</sup> on PCA is proper. This proof is the most involved, and interestingly, provides the only case where the zig-zag is not a span: it contains three intermediate nodes of which the middle one forms a span.

Our framework requires a proof of so-called *extension* and *reduction lemmas* in each case. While the extension lemma is a generic result that covers all cubic functors of interest, the reduction lemma is in all cases a nontrivial property intrinsic to the algebras under consideration. For the semiring of natural numbers it is a consequence of a result that we trace back to Hilbert; for the case of convex algebra [0, 1] the result is due to Minkowski. In the case of <sup>F</sup>-, we use Kakutani's set-valued fixpoint theorem.

It is an interesting question for future work whether these new properness results may lead to new complete axiomatizations of expressions for certain weighted automata.

The organization of the rest of the paper is as follows. In Sect. 2 we give some basic definitions and introduce the semirings, the categories, and the functors of interest. Section 3 provides the general framework as well as proofs of properness of the cubic functors. Sections 4, <sup>5</sup> and <sup>6</sup> lead us to properness of <sup>F</sup> on PCA. For space reasons, we present the ideas of proofs and constructions in the main paper and defer all detailed proofs to the arXiv-version [24].

### **2 Proper Functors**

We start with a brief introduction of the basic notions from algebra and coalgebra needed in the rest of the paper, as well as the important definition of proper functors [16]. We refer the interested reader to [9,11,20] for more details. We assume basic knowledge of category theory, see e.g. [14] or [24, Appendix A].

Let C be a category and F a C-endofunctor. The category Coalg(F) of F*coalgebras* is the category having as objects pairs (X, c) where X is an object of <sup>C</sup> and <sup>c</sup> is a <sup>C</sup>-morphism from <sup>X</sup> to F X, and as morphisms <sup>f</sup> : (X, c) <sup>→</sup> (Y, d) those C-morphisms from X to Y that make the diagram on the right commute.

> Y d

F Y

<sup>X</sup> <sup>f</sup> c F X F f -All base categories C in this paper will be *algebraic categories*, i.e., categories Set<sup>T</sup> of Eilenberg-Moore algebras of a finitary monad <sup>1</sup> in Set. Hence, all base categories are concrete with forgetful functor that is identity on morphisms.

In such categories behavioural equivalence [13,25,26] can be defined as follows. Let (X, c) and (Y, d) be F-coalgebras and let x ∈ X and y ∈ Y . Then x and y are *behaviourally equivalent*, and we write x ∼ y, if there exists an Fcoalgebra (Z, e) and Coalg(F)-morphisms <sup>f</sup> : (X, c) <sup>→</sup> (Z, e), <sup>g</sup> : (Y, d) <sup>→</sup> (Z, e), with f(x) = g(y).

$$(X,c)\xrightarrow{f\xrightarrow{f}}(Z,e)\xleftarrow{g}(Y,d)\xrightarrow{g\xrightarrow{g}}(Y,d)\varphi$$

<sup>1</sup> The notions of monads and algebraic categories are central to this paper. We recall them in [24, Appendix A] to make the paper better accessible to all readers.

If there exists a final coalgebra in Coalg(F), and all functors considered in this paper will have this property, then two elements are behaviourally equivalent if and only if they have the same image in the final coalgebra. If we have a *zig-zag diagram* in Coalg(F)

$$
\begin{pmatrix}
(X,c) & & & (Z\_2, e\_2) & & \cdots & \cdots & \binom{N}{2n-1} & \binom{N}{2n} \\
& \ddots & & \ddots & & \ddots & \ddots & \binom{N}{2n-1} & \binom{N}{2n} \\
& & & & (Z\_3, e\_1) & & & (Z\_{2n-1}, e\_1)
\end{pmatrix}
$$

which relates x with y in the sense that there exist elements z2<sup>k</sup> ∈ Z2<sup>k</sup>, k = 1,...,n − 1, with (setting z<sup>0</sup> = x and z2<sup>n</sup> = y)

$$f\_{2k}(z\_{2k}) = f\_{2k-1}(z\_{2k-2}), \quad k = 1, \ldots, n,$$

then x ∼ y.

We now recall the notion of a proper functor, introduced by Milius [16] which is central to this paper. It is very helpful for establishing completeness of regular expressions calculi, cf. [16, Corollary 3.17].

**Definition 2.1.** Let <sup>T</sup> : Set <sup>→</sup> Set be a finitary monad with unit <sup>η</sup> and multiplication μ. A Set<sup>T</sup> -endofunctor F is *proper*, if the following statement holds.

For each pair (T B1, c1) and (T B2, c2) of F-coalgebras with B<sup>1</sup> and B<sup>2</sup> finite sets, and each two elements b<sup>1</sup> ∈ B<sup>1</sup> and b<sup>2</sup> ∈ B<sup>2</sup> with η<sup>B</sup><sup>1</sup> (b1) ∼ η<sup>B</sup><sup>2</sup> (b2), there exists a zig-zag (1) in Coalg(F) which relates η<sup>B</sup><sup>1</sup> (b1) with η<sup>B</sup><sup>2</sup> (b2), and whose nodes (Z<sup>j</sup> , e<sup>j</sup> ) all have free and finitely generated carrier.

This notion generalizes the notion of a proper semiring introduced by Esik and Maletti in [8, Definition 3.2], cf. [16, Remark 3.10].

*Remark 2.2.* In the definition of properness the condition that intermediate nodes have free *and* finitely generated carrier is necessary for nodes with incoming arrows (the nodes Z2k−<sup>1</sup> in (1)). For the intermediate nodes with outgoing arrows (Z2<sup>k</sup> in (1)), it is enough to require that their carrier is finitely generated. This follows since every F-coalgebra with finitely generated carrier is the image under an F-coalgebra morphism of an F-coalgebra with free and finitely generated carrier.

Moreover, note that zig-zags which start (or end) with incoming arrows instead of outgoing ones, can also be allowed since a zig-zag of this form can be turned into one of the form (1) by appending identity maps.

#### **Some Concrete Monads and Functors**

We deal with the following base categories.


For <sup>n</sup> <sup>∈</sup> <sup>N</sup>, the free algebra with <sup>n</sup> generators in <sup>S</sup>-SMOD is the direct product <sup>S</sup><sup>n</sup>, and in PCA it is the <sup>n</sup>-simplex <sup>Δ</sup><sup>n</sup> <sup>=</sup> {(ξ1,...,ξn) <sup>|</sup> <sup>ξ</sup><sup>j</sup> <sup>≥</sup> <sup>0</sup>, n <sup>j</sup>=1 ξ<sup>j</sup> ≤ 1}.

Concerning semimodule-categories, we mainly deal with the semirings N, Q+, and R+, and their ring completions Z, Q, and R. For these semirings the categories of S-semimodules are


We consider the following functors, where A is a fixed finite alphabet. Recall that we use the term *cubic functor* for the functor <sup>T</sup><sup>1</sup> <sup>×</sup> (−)<sup>A</sup> where <sup>T</sup> is a monad on Set. We chose the name since <sup>T</sup><sup>1</sup> <sup>×</sup> (−)<sup>A</sup> assigns to objects <sup>X</sup> a full direct product, i.e., a full cube.

– The *cubic functor* F<sup>S</sup> on S-SMOD, i.e., the functor acting as

$$\begin{aligned} F\_{\mathbb{S}}X &= \mathbb{S} \times X^A \text{ for } X \text{ object of } \mathbb{S}\text{-\sf{SMOD}},\\ F\_{\mathbb{S}}f &= \text{id}\_{\mathbb{S}} \times (f \circ -) \text{ for } f \colon X \to Y \text{ morphism of } \mathbb{S}\text{-\sf{SMOD}}.\end{aligned}$$

The underlying Set functors of cubic functors are also sometimes called deterministic-automata functors, see e.g. [10], as their coalgebras are deterministic weighted automata with output in the semiring.


Cubic functors are liftings of Set-endofunctors, in particular, they preserve surjective algebra homomorphisms. It is easy to see that also the functor <sup>F</sup>- preserves surjectivity, cf. [24, Lemma D.1]. This property is needed to apply the work of Milius, cf. [16, Assumptions 3.1].

*Remark 2.3.* We can now formulate precisely the connection between proper semirings and proper functors mentioned after Definition 2.1. A semiring S is proper in the sense of [8], if and only if for every finite input alphabet A the cubic functor F<sup>S</sup> on S-SMOD is proper.

We shall interchangeably think of direct products as sets of functions or as sets of tuples. Taking the viewpoint of tuples, the definition of FSf reads as = 

$$(F \lhd f)\left((o,(x\_a)\_{a \in A})\right) = \left(o,(f(x\_a))\_{a \in A}\right), \quad o \in \mathbb{S}, \ x\_a \in X \text{ for } a \in A.$$

<sup>2</sup> This functor was denoted Gˆ in [23] where it was first studied in the context of axiomatization of trace semantics.

A coalgebra structure c : X → FSX writes as

$$\begin{aligned} \text{e } c &\colon X \to F\_{\mathbb{S}} X \text{ wites as} \\\\ c(x) &= \left( c\_{\diamond}(x), (c\_{a}(x))\_{a \in A} \right), \quad x \in X, \end{aligned}$$

and we use <sup>c</sup><sup>o</sup> : <sup>X</sup> <sup>→</sup> <sup>S</sup> and <sup>c</sup><sup>a</sup> : <sup>X</sup> <sup>→</sup> <sup>X</sup> as generic notation for the components of the map c. More generally, we define c<sup>w</sup> : X → X for any word w ∈ A<sup>∗</sup> inductively as c<sup>ε</sup> = id<sup>X</sup> and cwa = c<sup>a</sup> ◦ cw, w ∈ A∗, a ∈ A. is then given as trc(x) = 

The map from a coalgebra (X, c) into the final FS-coalgebra, the *trace map*, (c<sup>o</sup> ◦ cw)(x) <sup>w</sup>∈A<sup>∗</sup> for <sup>x</sup> <sup>∈</sup> <sup>X</sup>. Behavioural equivalence for cubic functors is the kernel of the trace map.

### **3 Properness of Cubic Functors**

Our proofs of properness in this section and in Sect. 6 below start from the following idea. Let S be a semiring, and assume we are given two FS-coalgebras which have free finitely generated carrier, say (S<sup>n</sup><sup>1</sup> , c1) and (S<sup>n</sup><sup>2</sup> , c2). Moreover, assume <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>n</sup><sup>1</sup> and <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>n</sup><sup>2</sup> are two elements having the same trace. For <sup>j</sup> = 1, 2, let <sup>d</sup><sup>j</sup> : <sup>S</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>n</sup><sup>2</sup> <sup>→</sup> <sup>F</sup>S(S<sup>n</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>n</sup><sup>2</sup> ) be given by <sup>d</sup><sup>j</sup> (y1, y2) = 

$$d\_j(y\_1, y\_2) = \left(c\_{j\_o}(y\_j), ((c\_{1a}(y\_1), c\_{2a}(y\_2)))\_{a \in A}\right).$$

Denoting by <sup>π</sup><sup>j</sup> : <sup>S</sup><sup>n</sup><sup>1</sup> <sup>×</sup>S<sup>n</sup><sup>2</sup> <sup>→</sup> <sup>S</sup><sup>n</sup>*<sup>j</sup>* the canonical projections, both sides of the following diagram separately commute.

However, in general the maps d<sup>1</sup> and d<sup>2</sup> do not coincide.

The next lemma contains a simple observation: there exists a subsemimodule <sup>Z</sup> of <sup>S</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>n</sup><sup>2</sup> , such that the restrictions of <sup>d</sup><sup>1</sup> and <sup>d</sup><sup>2</sup> to <sup>Z</sup> coincide and turn <sup>Z</sup> into an FS-coalgebra.

**Lemma 3.1.** *Let* <sup>Z</sup> *be the subsemimodule of* <sup>S</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>n</sup><sup>2</sup> *generated by the pairs* (c<sup>1</sup>w(x1), c<sup>2</sup>w(x2)) *for* w ∈ A∗*. Then* d1|<sup>Z</sup> = d2|<sup>Z</sup> *and* d<sup>j</sup> (Z) ⊆ FS(Z)*.*

The significance of Lemma 3.1 in the present context is that it leads to the diagram (we denote d = d<sup>j</sup> |Z)

In other words, it leads to the zig-zag in Coalg(FS)

$$(\mathbb{S}^{n\_1}, c\_1) \xleftarrow{\pi\_1} (Z, d) \xrightarrow{\pi\_2} (\mathbb{S}^{n\_2}, c\_2) \tag{2}$$

This zig-zag relates x<sup>1</sup> with x<sup>2</sup> since (x1, x2) ∈ Z. If it can be shown that Z is always finitely generated, it will follow that F<sup>S</sup> is proper.

Let S be a Noetherian semiring, i.e., such that every S-subsemimodule of some finitely generated S-semimodule is itself finitely generated. Then Z is, as an <sup>S</sup>-subsemimodule of <sup>S</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>n</sup><sup>2</sup> , finitely generated. We reobtain [8, Theorem 4.2].

#### **Corollary 3.2 (Esik–Maletti 2010).** *Every Noetherian semiring is proper.*

Our first main result is Theorem 3.3 below, where we show properness of the cubic functors F<sup>S</sup> on S-SMOD, for S being one of the semirings N, Q+, R+, and of the cubic functor F[0,1] on PCA. The case of F <sup>N</sup> is known from [2, Theorem 4]<sup>3</sup>, the case of F[0,1] is stated as an open problem in [16, Example 3.19].

**Theorem 3.3.** *The cubic functors* F <sup>N</sup>*,* F <sup>Q</sup><sup>+</sup> *,* F <sup>R</sup><sup>+</sup> *, and* F[0,1] *are proper.*

*In fact, for any two coalgebras with free finitely generated carrier and any two elements having the same trace, a zig-zag with free and finitely generated nodes relating those elements can be found, which is a span (has a single intermediate node with outgoing arrows).*

The proof proceeds via relating to the Noetherian case. It always follows the same scheme, which we now outline. Observe that the ring completion of each of N, Q+, R+, is Noetherian (for the last two it actually is a field), and that [0, 1] is the positive part of the unit ball in R.

*Step 1. The extension lemma:* We use an extension of scalars process to pass from the given category C to an associated category E-MOD with a Noetherian ring E. This is a general categorical argument.

<sup>3</sup> In [2] only a sketch of the proof is given, cf. [2, Sect. 3.3]. In this sketch one important point is not mentioned. Using the terminology of [2, Sect. 3.3]: it could a priori be possible that the size of the vectors in G and the size of G both oscillate.

To unify notation, we agree that S may also take the value [0, 1], and that T[0,1] is the monad of finitely supported subprobability distributions giving rise to the category PCA.


For the formulation of the extension lemma, recall that the starting category C is the Eilenberg-Moore category of the monad T<sup>S</sup> and the target category E-MOD is the Eilenberg-Moore category of TE. We write η<sup>S</sup> and μ<sup>S</sup> for the unit and multiplication of T<sup>S</sup> and analogously for TE. We have T<sup>S</sup> ≤ TE, via the inclusion monad morphism ι: T<sup>S</sup> ⇒ T<sup>E</sup> given by ιX(u) = u, as η<sup>E</sup> = ι ◦ η<sup>S</sup> and μ<sup>E</sup> ◦ ιι = ι ◦ μ<sup>S</sup> where ιι def <sup>=</sup> <sup>T</sup>E<sup>ι</sup> ◦ <sup>ι</sup> nat. = ι ◦ TSι. Recall that a monad morphism <sup>ι</sup>: <sup>T</sup><sup>S</sup> <sup>→</sup> <sup>T</sup><sup>E</sup> defines a functor <sup>M</sup><sup>ι</sup> : Set<sup>T</sup><sup>E</sup> <sup>→</sup> Set<sup>T</sup><sup>S</sup> which maps a <sup>T</sup>Ealgebra (X, αX) to (X, ι<sup>X</sup> ◦ αX) and is identity on morphisms. Obviously, M<sup>ι</sup> commutes with the forgetful functors <sup>U</sup><sup>S</sup> : Set<sup>T</sup><sup>S</sup> <sup>→</sup> Set and <sup>U</sup><sup>E</sup> : Set<sup>T</sup><sup>E</sup> <sup>→</sup> Set, i.e., U<sup>S</sup> ◦ M<sup>ι</sup> = UE.

**Definition 3.4.** Let (X, αX) <sup>∈</sup> Set<sup>T</sup><sup>S</sup> and (Y,α<sup>Y</sup> ) <sup>∈</sup> Set<sup>T</sup><sup>E</sup> where <sup>T</sup><sup>S</sup> and <sup>T</sup><sup>E</sup> are monads with <sup>T</sup><sup>S</sup> <sup>≤</sup> <sup>T</sup><sup>E</sup> via <sup>ι</sup>: <sup>T</sup><sup>S</sup> <sup>⇒</sup> <sup>T</sup>E. A Set-arrow <sup>h</sup>: <sup>X</sup> <sup>→</sup> <sup>Y</sup> is a <sup>T</sup><sup>S</sup> <sup>≤</sup> <sup>T</sup>Ehomomorphism from (X, αX) to (Y,α<sup>Y</sup> ) if and only if the following diagram commutes (in Set)

where ιh denotes the map ιh def = TEh ◦ ι<sup>X</sup> nat. = ι<sup>Y</sup> ◦ TSh. In other words, a <sup>T</sup><sup>S</sup> <sup>≤</sup> <sup>T</sup>E-homomorphism from (X, αX) to (Y,α<sup>Y</sup> ) is a morphism in Set<sup>T</sup><sup>S</sup> from (X, αX) to M(Y,α<sup>Y</sup> ).

Now we can formulate the extension lemma.

**Proposition 3.5 (Extension Lemma).** *For every* FS*-coalgebra* TSB <sup>c</sup> → FS(TSB) *with free finitely generated carrier* TSB *for a finite set* B*, there exists an* F <sup>E</sup>*-coalgebra* TEB <sup>c</sup>˜ → F <sup>E</sup>(TEB) *with free finitely generated carrier* TEB *such that*

$$\begin{array}{c} T\_{\mathbb{S}}B \xrightarrow{\iota\_{B}} \begin{array}{c} \iota\_{B} \\ \xrightarrow{\iota\_{B}} \end{array} T\_{\mathbb{E}}B \\ F\_{\mathbb{S}}(T\_{\mathbb{S}}B) \xrightarrow{\iota\_{1} \times (\iota\_{B})^{A}} F\_{\mathbb{E}}(T\_{\mathbb{E}}B) \end{array}$$

*where the horizontal arrows (*ι<sup>B</sup> *and* ι<sup>1</sup> × ι A <sup>B</sup>*) are* T<sup>S</sup> ≤ TE*-homomorphisms, and moreover they both amount to inclusion.*

*Step 2. The basic diagram:* Let <sup>n</sup>1, n<sup>2</sup> <sup>∈</sup> <sup>N</sup>, let <sup>B</sup><sup>j</sup> be the <sup>n</sup><sup>j</sup> -element set consisting of the canonical basis vectors of E<sup>n</sup>*<sup>j</sup>* , and set X<sup>j</sup> = TSB<sup>j</sup> . Assume we are given FS-coalgebras (X1, c1) and (X2, c2), and elements x<sup>j</sup> ∈ X<sup>j</sup> with trc<sup>1</sup> x<sup>1</sup> = trc<sup>2</sup> x2.

The extension lemma provides <sup>F</sup> <sup>E</sup>-coalgebras (E<sup>n</sup>*<sup>j</sup>* , <sup>c</sup>˜<sup>j</sup> ) with ˜c<sup>j</sup> <sup>|</sup>X*<sup>j</sup>* <sup>=</sup> <sup>c</sup><sup>j</sup> . Clearly, trc˜<sup>1</sup> x<sup>1</sup> = trc˜<sup>2</sup> x2. Using the zig-zag diagram (2) in Coalg(FE) and appending inclusion maps, we obtain what we call the *basic diagram*. In this diagram all solid arrows are arrows in E-MOD, and all dotted arrows are arrows in C. The horizontal dotted arrows denote the inclusion maps, and π<sup>j</sup> are the restrictions to Z of the canonical projections.

Commutativity of this diagram yields d π−<sup>1</sup> <sup>j</sup> (X<sup>j</sup> ) FSX<sup>j</sup> ) for j = 1, 2. Now we observe the following properties of cubic functors.

**Lemma 3.6.** *We have* <sup>F</sup> <sup>E</sup><sup>X</sup> <sup>∩</sup> <sup>F</sup>S<sup>Y</sup> <sup>=</sup> <sup>F</sup>S(<sup>X</sup> <sup>∩</sup> <sup>Y</sup> )*. Moreover, if* <sup>Y</sup><sup>j</sup> <sup>⊆</sup> <sup>X</sup><sup>j</sup> *, then* (<sup>F</sup> <sup>E</sup>π1)−<sup>1</sup>(FSY1) <sup>∩</sup> (<sup>F</sup> <sup>E</sup>π2)−<sup>1</sup>(FSY2) = <sup>F</sup>S(Y<sup>1</sup> <sup>×</sup> <sup>Y</sup>2)*.* −1 −1

Using this, yields

$$\begin{aligned} d\left(Z \cap \left(X\_1 \times X\_2\right)\right) &\subseteq F\_{\mathbb{E}}Z \cap \left(F\_{\mathbb{E}}\pi\_1\right)^{-1}\left(F\_{\mathbb{S}}X\_1\right) \cap \left(F\_{\mathbb{E}}\pi\_2\right)^{-1}\left(F\_{\mathbb{S}}X\_2\right) \\ &= F\_{\mathbb{E}}Z \cap F\_{\mathbb{S}}\left(X\_1 \times X\_2\right) = F\_{\mathbb{S}}\left(Z \cap \left(X\_1 \times X\_2\right)\right). \end{aligned}$$

This shows that Z ∩ (X<sup>1</sup> × X2) becomes an FS-coalgebra with the restriction d|<sup>Z</sup>∩(X1×X2). Again referring to the basic diagram, we have the following zigzag in Coalg(FS) (to shorten notation, denote the restrictions of d, π1, π<sup>2</sup> to Z ∩ (X<sup>1</sup> × X2) again as d, π1, π2): (X1, c1) <sup>Z</sup> <sup>∩</sup> (X<sup>1</sup> <sup>×</sup> <sup>X</sup>2), d <sup>π</sup> <sup>1</sup> <sup>π</sup><sup>2</sup> -

$$(X\_1, c\_1) \xleftarrow{\pi\_1} (Z \cap (X\_1 \times X\_2), d) \xrightarrow{\pi\_2} (X\_2, c\_2) \tag{3}$$

This zig-zag relates x<sup>1</sup> with x<sup>2</sup> since (x1, x2) ∈ Z ∩ (X<sup>1</sup> × X2).

*Step 3. The reduction lemma:* In view of the zig-zag (3), the proof of Theorem 3.3 can be completed by showing that Z∩(X1×X2) is finitely generated as an algebra in <sup>C</sup>. Since <sup>Z</sup> is a submodule of the finitely generated module <sup>E</sup><sup>n</sup><sup>1</sup> <sup>×</sup>E<sup>n</sup><sup>2</sup> over the Noetherian ring E, it is finitely generated as an E-module. The task thus is to show that being finitely generated is preserved when reducing scalars.

This is done by what we call the *reduction lemma*. Contrasting the extension lemma, the reduction lemma is not a general categorical fact, and requires specific proof in each situation.

**Proposition 3.7 (Reduction Lemma).** *Let* <sup>n</sup>1, n<sup>2</sup> <sup>∈</sup> <sup>N</sup>*, let* <sup>B</sup><sup>j</sup> *be the set consisting of the* n<sup>j</sup> *canonical basis vectors of* E<sup>n</sup>*<sup>j</sup> , and set* X<sup>j</sup> = TSB<sup>j</sup> *. Moreover, let* <sup>Z</sup> *be an* <sup>E</sup>*-submodule of* <sup>E</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>E</sup><sup>n</sup><sup>2</sup> *. Then* <sup>Z</sup> <sup>∩</sup> (X<sup>1</sup> <sup>×</sup> <sup>X</sup>2) *is finitely generated as an algebra in* C*.*

### **4 A Subcubic Convex Functor**

Recall the following definition from [23, p. 309].

**Definition 4.1.** We introduce a functor <sup>F</sup>- : PCA <sup>→</sup> PCA. 

1. Let X be a PCA. Then

$$\begin{aligned} \text{1. Let } X \text{ be a PCA. Then} \\ \widehat{F}X &= \left\{ (o, \phi) \in [0, 1] \times X^A \, | \, \\ &\qquad \exists n\_a \in \mathbb{N}. \, \exists p\_{a, j} \in [0, 1], x\_{a, j} \in X \text{ for } j = 1, \dots, n\_a, a \in A. \\ &\qquad o + \sum\_{a \in A} \sum\_{j=1}^{n\_a} p\_{a, j} \le 1, \, \phi(a) = \sum\_{j=1}^{n\_a} p\_{a, j} x\_{a, j} \right\}. \\ 2. \text{ Let } X\_\lambda Y \text{ be PCAs, and } f: X \to Y \text{ a convex map. Then } \widehat{F}f: \widehat{F}X \to \widehat{F}Y \text{ is the identity map,} \\ &\qquad \widehat{F}X \to \widehat{F}Y \text{ is the identity map.} \end{aligned}$$

 is the map F f- = id[0,1] ×(f ◦ −). For every <sup>X</sup> we have F X-

 ⊆ F[0,1]X, and for every f : X → Y we have F f- = (F[0,1]f)| FX- . For this reason, we think of <sup>F</sup> as a *subcubic functor*. The definition of <sup>F</sup>-

 can be simplified. 

**Lemma 4.2.** *Let* X *be a* PCA*, then*

\*\*\text{\*\*1.2.\*\* } Let  $X$  be a  $\mathsf{PCA}$ , then\*\* 
$$\widehat{F}X = \left\{ (o, f) \in [0, 1] \times X^A \, | \, \exists p\_a \in [0, 1], x\_a \in X \text{ for } a \in A. \right\}$$

$$o + \sum\_{a \in A} p\_a \le 1, \, f(a) = p\_a x\_a \right\}.$$

$$\text{From this representation it is obvious that } \widehat{F} \text{ is monotone in the set } \{0, 1\}$$

is monotone in the sense that


does not preserve direct products.

 can be described with help of a geometric notion, namely using the Minkowksi functional of X. Before we can state this fact, we have to make a brief digression to explain this notion and its properties.

**Definition 4.3.** Let <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> be a PCA. The *Minkowski functional* of <sup>X</sup> is the map <sup>μ</sup><sup>X</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> [0,∞] defined as <sup>μ</sup>X(x) = inf{t > <sup>0</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> tX}, where the infimum of the empty set is understood as ∞.

Minkowski functionals, sometimes also called *gauge*, are a central and exhaustively studied notion in convex geometry, see, e.g., [19, p. 34] or [18, p. 28].

We list some basic properties whose proof can be found in the mentioned textbooks.


The set X can almost be recovered from μX.


*Example 4.4.* As two simple examples, consider the <sup>n</sup>-simplex <sup>Δ</sup><sup>n</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> and a convex cone <sup>C</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>. Then (here <sup>≥</sup> denotes the product order on <sup>R</sup><sup>n</sup>) <sup>μ</sup>Δ*<sup>n</sup>* (x) = <sup>n</sup>

$$\begin{aligned} &Example \downarrow 4. \text{ As two simple examples, consider the } n \text{-simplex } \Delta^n \subseteq \mathbb{R}^n \text{ and convex cone } C \subseteq \mathbb{R}^n. \text{ Then (here } \ge \text{ denotes the product order on } \mathbb{R}^n) \\ & \mu\_{\Delta^n}(x) = \begin{cases} \sum\_{j=1}^n \xi\_j, & x = (\xi\_1, \dots, \xi\_n) \ge 0, \\ \infty & , \text{ otherwise.} \end{cases} \qquad \mu\_C(x) = \begin{cases} 0 & , \quad x \in C, \\ \infty, & \text{otherwise.} \end{cases} \end{aligned}$$

Observe that <sup>Δ</sup><sup>n</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>μ</sup>Δ*<sup>n</sup>* (x) <sup>≤</sup> <sup>1</sup>}.

Another illustrative example is given by general pyramids in a euclidean space. This example will play an important role later on. 

*Example 4.5.* For <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> consider the set

$$\begin{array}{l} \text{will play an important role later on.}\\ \\ \in \mathbb{R}^n \text{ consider the set} \\\\ X = \left\{ x \in \mathbb{R}^n \mid x \ge 0 \text{ and } (x, u) \le 1 \right\}, \end{array}$$

where (·, ·) denotes the euclidean scalar product on <sup>R</sup><sup>n</sup>. The set <sup>X</sup> is intersection of the cone R<sup>n</sup> <sup>+</sup> with the half-space given by the inequality (x, u) ≤ 1, hence it is convex and contains 0. Thus X is a PCA.

Let us first assume that u is strictly positive, i.e., u ≥ 0 and no component of u equals zero. Then X is a pyramid (in 2-dimensional space, a triangle).

The n-simplex Δ<sup>n</sup> is the pyramid obtained using u = (1,..., 1).

The Minkowski functional of the pyramid X associated with u is

μX(x)=(x, u) if x ≥ 0, μX(x) = ∞ otherwise.

Write <sup>u</sup> <sup>=</sup> <sup>n</sup> <sup>j</sup>=1 <sup>α</sup><sup>j</sup> <sup>e</sup><sup>j</sup> , where <sup>e</sup><sup>j</sup> is the <sup>j</sup>-th canonical basis vector, and set <sup>y</sup><sup>j</sup> <sup>=</sup> <sup>1</sup> <sup>α</sup>*<sup>j</sup>* <sup>e</sup><sup>j</sup> . Clearly, {y1,...,yn} is linearly independent. Each vector <sup>x</sup> <sup>=</sup> <sup>n</sup> <sup>j</sup>=1 ξ<sup>j</sup> e<sup>j</sup> can be written as <sup>x</sup> <sup>=</sup> <sup>n</sup> <sup>j</sup>=1(ξjα<sup>j</sup> )y<sup>j</sup> , and this is a subconvex combination if and only if <sup>ξ</sup><sup>j</sup> <sup>≥</sup> 0 and <sup>n</sup> <sup>j</sup>=1 ξjα<sup>j</sup> ≤ 1, i.e., if and only if x ∈ X. Thus X is generated by {y1,...,yn} as a PCA.

The linear map given by the diagonal matrix made up of the α<sup>j</sup> 's induces a bijection of X onto Δn, and maps the y<sup>j</sup> 's to the corner points of Δn. Hence, X is free with basis {y1,...,yn}.

If u is not strictly positive, the situation changes drastically. Then X is not finitely generated as a PCA, because it is unbounded whereas the subconvex hull of a finite set is certainly bounded.

. 

**Lemma 4.6.** *Let* <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> *be a* PCA*, and assume that* <sup>X</sup> *is compact. Then*

$$\begin{array}{l}\text{return to the functor } F. \\\\ \mathbf{4.6.} \text{ Let } X \subseteq \mathbb{R}^n \text{ be a PCA, and assume that } X \text{ is compact.} \\\\ \widehat{F}X = \left\{ (o, \phi) \in \mathbb{R} \times (\mathbb{R}^n)^A \mid o \ge 0, \ o + \sum\_{a \in A} \mu\_X(\phi(a)) \le 1 \right\}. \end{array}$$

In the following we use the elementary fact that every convex map has a linear extension.

**Lemma 4.7.** *Let* <sup>V</sup>1, V<sup>2</sup> *be vector spaces, let* <sup>X</sup> <sup>⊆</sup> <sup>V</sup><sup>1</sup> *be a* PCA*, and let* <sup>c</sup> : <sup>X</sup> <sup>→</sup> <sup>V</sup><sup>2</sup> *be a convex map. Then* c *has a linear extension* c˜: V<sup>1</sup> → V2*. If* span X = V1*, this extension is unique.* Rescaling in this representation of F Xleads to a characterisation of <sup>F</sup>-

 coalgebra maps. We give a slightly more general statement.

**Corollary 4.8.** *Let* X, Y <sup>⊆</sup> <sup>R</sup><sup>n</sup> *be* PCA *s, and assume that* <sup>X</sup> *and* <sup>Y</sup> *are compact. Further, let* <sup>c</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>×</sup> (R<sup>n</sup>)<sup>A</sup> *be a convex map, and let* <sup>c</sup>˜: <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> <sup>×</sup> (R<sup>n</sup>)<sup>A</sup> *be a linear extension of* <sup>c</sup>*. Then* <sup>c</sup>(X) <sup>⊆</sup> F Y- *, if and only if* c˜*o*(x) + 

$$
\widetilde{c}\_o(x) + \sum\_{a \in A} \mu\_Y(\widetilde{c}\_a(x)) \le \mu\_X(x), \quad x \in \mathbb{R}^n. \tag{4}
$$

$$
\text{5} \quad \text{An Extzension Theorem for } \widehat{F}\text{-coalgebras}
$$

# **-coalgebras** In this section we establish an extension theorem for <sup>F</sup>-


**Theorem 5.1.** *Let* (X, c) *be an* <sup>F</sup>-*-coalgebra whose carrier* X *is a compact subset of a euclidean space* <sup>R</sup><sup>n</sup> *with* <sup>Δ</sup><sup>n</sup> <sup>⊆</sup> <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>+</sup>*. Assume that the output map* c*<sup>o</sup> does not vanish on invariant coordinate hyperplanes in the sense that (*e<sup>j</sup> *denotes again the* j*-th canonical basis vector in* R<sup>n</sup>*)*

$$\begin{aligned} \#I &\subseteq \{1, \ldots, n\}. \\ I &\neq \emptyset, \quad c\_o(e\_j) = 0, j \in I, \quad c\_a(e\_j) \subseteq \text{span}\{e\_i \mid i \in I\}, a \in A, j \in I. \end{aligned} \tag{5}$$
  $Then \text{ } there \text{ } exists \text{ } an \text{ } \widehat{F}\text{-}coalgebra \text{ } (Y, d), \text{ } such \text{ } that \ X \subseteq Y \subseteq \mathbb{R}\_+^n, \text{ } the \text{ } inclusion$ 

*-coalgebra* (Y, d)*, such that* <sup>X</sup> <sup>⊆</sup> <sup>Y</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>+</sup>*, the inclusion map* <sup>ι</sup>: <sup>X</sup> <sup>→</sup> <sup>Y</sup> *is a* Coalg(F-)*-morphism, and* Y *is the subconvex hull of* n *linearly independent vectors (in particular,* Y *is free with* n *generators).*

The idea of the proof can be explained by geometric intuition. Say, we have an F--coalgebra (X, c) of the stated form, and let ˜<sup>c</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> <sup>×</sup> (R<sup>n</sup>)<sup>A</sup> be the linear extension of c to all of R<sup>n</sup>, cf. Lemma 4.7.

Remembering that pyramids are free and finitely generated, we will be done if we find a pyramid <sup>Y</sup> <sup>⊇</sup> <sup>X</sup> which is mapped into F Yby ˜c:

This task can be reformulated as follows: For each pyramid Y<sup>1</sup> containing X let <sup>P</sup>(Y1) be the set of all pyramids <sup>Y</sup><sup>2</sup> containing <sup>X</sup>, such that ˜c(Y2) <sup>⊆</sup> F Y- <sup>1</sup>. If we find Y with Y ∈ P(Y ), we are done.

Existence of Y can be established by applying a fixed point principle for setvalued maps. The result sufficient for our present level of generality is Kakutani's generalisation [12, Corollary] of Brouwers fixed point theorem. **6 Properness of** *<sup>F</sup>*-

### 

In this section we give the second main result of the paper. **Theorem 6.1.** *The functor* <sup>F</sup>-

*is proper.*

*In fact, for each two given coalgebras with free finitely generated carrier and each two elements having the same trace, a zig-zag with free and finitely generated nodes relating those elements can be found, which has three intermediate nodes with the middle one forming a span.*

We try to follow the proof scheme familiar from the cubic case. Assume we are given two <sup>F</sup>--coalgebras with free finitely generated carrier, say (Δn<sup>1</sup> , c1) and (Δn<sup>2</sup> , c2), and elements <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Δ</sup>n<sup>1</sup> and <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Δ</sup>n<sup>2</sup> having the same trace. Since F Δ <sup>n</sup>*<sup>j</sup>* <sup>⊆</sup> <sup>R</sup>×(R<sup>n</sup>*<sup>j</sup>* )<sup>A</sup> we can apply Lemma 4.7 and obtain <sup>F</sup> <sup>R</sup>-coalgebras (R<sup>n</sup>*<sup>j</sup>* , <sup>c</sup>˜<sup>j</sup> ) with ˜c<sup>j</sup> |Δ*nj* = c<sup>j</sup> . This leads to the basic diagram:

At this point the line of argument known from the cubic case breaks: it is *not* granted that <sup>Z</sup> <sup>∩</sup> (Δ<sup>n</sup><sup>1</sup> <sup>×</sup> <sup>Δ</sup><sup>n</sup><sup>2</sup> ) becomes an <sup>F</sup>--coalgebra with the restriction of d.

The substitute for <sup>Z</sup> <sup>∩</sup>(Δ<sup>n</sup><sup>1</sup> <sup>×</sup>Δ<sup>n</sup><sup>2</sup> ) suitable for proceeding one step further is given by the following lemma, where we tacitly identify <sup>R</sup><sup>n</sup><sup>1</sup> <sup>×</sup> <sup>R</sup><sup>n</sup><sup>2</sup> with <sup>R</sup><sup>n</sup>1+n<sup>2</sup> . **Lemma 6.2.** *We have* <sup>d</sup>(<sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> ) <sup>⊆</sup> <sup>F</sup>-

(<sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> )*.*

This shows that <sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> becomes an <sup>F</sup>--coalgebra with the restriction of d. Still, we cannot return to the usual line of argument: it is *not* granted that <sup>π</sup><sup>j</sup> (<sup>Z</sup> <sup>∩</sup>2Δ<sup>n</sup>1+n<sup>2</sup> ) <sup>⊆</sup> <sup>Δ</sup><sup>n</sup>*<sup>j</sup>* . This forces us to introduce additional nodes to produce a zig-zag in Coalg(F-). These additional nodes are given by the following lemma. There co(−) denotes the convex hull. **Lemma 6.3.** *Set* <sup>Y</sup><sup>j</sup> = co(Δ<sup>n</sup>*<sup>j</sup>* <sup>∪</sup> <sup>π</sup><sup>j</sup> (<sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> ))*. Then* <sup>c</sup>˜<sup>j</sup> (Y<sup>j</sup> ) <sup>⊆</sup> F Y-

$$\textbf{Lemma 6.3. } Set \ Y\_j = \text{co}(\Delta^{n\_j} \cup \pi\_j(Z \cap 2\Delta^{n\_1+n\_2})). \text{ Then } \hat{c}\_j(Y\_j) \subseteq \hat{F}Y\_j.$$

$$\text{This shows that } Y\_j \text{ becomes an } \hat{F}\text{-coalgebra with the restriction of}$$


$$\left( \left( \Delta^{n\_1}, c\_1 \right) \xrightarrow{\subseteq} \left( Y\_1, \check{c}\_1 \right) \xleftarrow{\pi\_1} \left( Z \cap 2\Delta^{n\_1 + n\_2}, d \right) \xrightarrow{\pi\_2} \left( Y\_2, \check{c}\_2 \right) \xleftarrow{\supset} \left( \Delta^{n\_2}, c\_2 \right) \right)$$

This zig-zag relates <sup>x</sup><sup>1</sup> and <sup>x</sup><sup>2</sup> since (x1, x2) <sup>∈</sup> <sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> .

Using Minkowski's Theorem and the argument from [24, Lemma B.8] shows that the middle node has finitely generated carrier. The two nodes with incoming arrows are, as convex hulls of two finitely generated PCAs, of course also finitely generated. But in general they will not be free (and this is essential, remember Remark 2.2). Now Theorem 5.1 comes into play.

**Lemma 6.4.** *Assume that each of* (Δn<sup>1</sup> , c1) *and* (Δn<sup>2</sup> , c2) *satisfies the following condition:*

$$\begin{aligned} \#I \subseteq \{1, \ldots, n\}. \\ I \neq \emptyset, \ c\_{j\_o}(e\_k) = 0, k \in I, \ c\_{j\_a}(e\_k) \subseteq \text{co}(\{e\_i \mid i \in I\} \cup \{0\}), a \in A, k \in I. \end{aligned} \tag{6}$$

*Then there exist free finitely generated* PCA*<sup>s</sup>* <sup>U</sup><sup>j</sup> *with* <sup>Y</sup><sup>j</sup> <sup>⊆</sup> <sup>U</sup><sup>j</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>*<sup>j</sup>* <sup>+</sup> *which satisfy* <sup>c</sup>˜<sup>j</sup> (U<sup>j</sup> ) <sup>⊆</sup> F U<sup>j</sup> *.*

This shows that U<sup>j</sup> , under the additional assumption (6) on (Δn*<sup>j</sup>* , c<sup>j</sup> ), becomes an <sup>F</sup>--coalgebra with the restriction of ˜c<sup>j</sup> . Thus we have a zig-zag in Coalg(F-) relating x<sup>1</sup> and x<sup>2</sup> whose nodes with incoming arrows are free and finitely generated, and whose node with outgoing arrows is finitely generated: <sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>n</sup>1+n<sup>2</sup> , d <sup>π</sup> <sup>1</sup> <sup>π</sup><sup>2</sup> -

Removing the additional assumption on (Δ<sup>n</sup>*<sup>j</sup>* , c<sup>j</sup> ) is an easy exercise. 

**Lemma 6.5.** *Let* (Δ<sup>n</sup>, c) *be an* <sup>F</sup>-*-coalgebra. Assume that* I *is a nonempty subset of* {1,...,n} *with* <sup>c</sup>*o*(ek)=0, k <sup>∈</sup> <sup>I</sup> *and* <sup>c</sup>a(ek) <sup>∈</sup> co

$$c\_o(e\_k) = 0, \ k \in I \quad \text{and} \quad c\_a(e\_k) \in \text{co}\left(\{e\_i \mid i \in I\} \cup \{0\}\right), \ a \in A, k \in I. \tag{7}$$

*Let* <sup>X</sup> *be the free* PCA *with basis* {e<sup>k</sup> <sup>|</sup> <sup>k</sup> ∈ {1,...,n} \ <sup>I</sup>}*, and let* <sup>f</sup> : <sup>Δ</sup><sup>n</sup> <sup>→</sup> <sup>X</sup> *be the* PCA*-morphism with* <sup>f</sup>(ek)=0 *if* <sup>k</sup> <sup>∈</sup> <sup>I</sup> *and* <sup>f</sup>(ek) = <sup>e</sup><sup>k</sup> *if* <sup>k</sup> ∈ <sup>I</sup>*. Further, let* <sup>g</sup> : <sup>X</sup> <sup>→</sup> [0, 1] <sup>×</sup> <sup>X</sup><sup>A</sup> *be the* PCA*-morphism with* g(ek) = f(ca(ek)) 

$$g(e\_k) = \left(c\_o(e\_k), \left(f(c\_a(e\_k))\right)\_{a \in A}\right), \quad k \in \{1, \dots, n\} \backslash I.$$
 
$$\text{b) is an } \widehat{F}\text{-coalgebra, and } f \text{ is an } \widehat{F}\text{-coalgebra morphism}$$

*Then* (X, g) *is an* <sup>F</sup>-*-coalgebra morphism of* (Δ<sup>n</sup>, c) *onto* (X, g)*.* **Corollary 6.6.** *Let* (Δ<sup>n</sup>, c) *be an* <sup>F</sup>-*-coalgebra. Then there exists* <sup>k</sup> <sup>≤</sup> <sup>n</sup>*, an* <sup>F</sup>-

 *coalgebra* (Δ<sup>k</sup>, g)*, such that* (Δ<sup>k</sup>, g) *satisfies the assumption in Lemma 6.4 and such that there exists an* <sup>F</sup>-*-coalgebra map* f *of* (Δ<sup>n</sup>, c) *onto* (Δ<sup>k</sup>, g)*.*

The proof of Theorem 6.1 is now finished by putting together what we showed so far. Starting with <sup>F</sup>--coalgebras (Δ<sup>n</sup>*<sup>j</sup>* , c<sup>j</sup> ) without any additional assumptions, and elements <sup>x</sup><sup>j</sup> <sup>∈</sup> <sup>Δ</sup><sup>n</sup>*<sup>j</sup>* having the same trace, we first reduce by means of Corollary 6.6 and then apply Lemma 6.4. This gives a zig-zag as required: <sup>Z</sup> <sup>∩</sup> <sup>2</sup>Δ<sup>k</sup>1+k<sup>2</sup> , d

.

**Acknowledgements.** We thank the anonymous reviewers for many valuable comments, in particular for reminding us of a categorical property that shortened the proof of the extension lemma.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **From Symmetric Pattern-Matching to Quantum Control**

Amr Sabry<sup>1</sup> , Benoˆıt Valiron2(B) , and Juliana Kaizer Vizzotto<sup>3</sup>

> <sup>1</sup> Indiana University, Bloomington, IN, USA sabry@indiana.edu

<sup>2</sup> LRI, CentraleSup´elec, Universit´e Paris-Saclay, Orsay, France benoit.valiron@lri.fr <sup>3</sup> Universidade Federal de Santa Maria, Santa Maria, Brazil

juvizzotto@inf.ufsm.br

**Abstract.** One perspective on quantum algorithms is that they are classical algorithms having access to a special kind of memory with exotic properties. This perspective suggests that, even in the case of quantum algorithms, the control flow notions of sequencing, conditionals, loops, and recursion are entirely classical. There is however, another notion of control flow, that is itself quantum. The notion of quantum conditional expression is reasonably well-understood: the execution of the two expressions becomes itself a superposition of executions. The quantum counterpart of loops and recursion is however not believed to be meaningful in its most general form.

In this paper, we argue that, under the right circumstances, a reasonable notion of quantum loops and recursion is possible. To this aim, we first propose a classical, typed, reversible language with lists and fixpoints. We then extend this language to the *closed* quantum domain (without measurements) by allowing linear combinations of terms and restricting fixpoints to structurally recursive fixpoints whose termination proofs match the proofs of convergence of sequences in infinitedimensional Hilbert spaces. We additionally give an operational semantics for the quantum language in the spirit of algebraic lambda-calculi and illustrate its expressiveness by modeling several common unitary operations.

### **1 Introduction**

The control flow of a program describes how its elementary operations are organized along the execution. Usual primitive control mechanisms are sequences, tests, iteration and recursion. Elementary operations placed in sequence are executed in order. Tests allow conditionally executing a group of operations and changing the course of the execution of the program. Finally, iteration gives the

B. Valiron and J. K. Vizzotto—Partially funded by FoQCoss STIC AmSud project - STIC-AmSUD/Capes - Foundations of Quantum Computation: Syntax and Semantics.

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 348–364, 2018. https://doi.org/10.1007/978-3-319-89366-2\_19

possibility to iterate a process an arbitrary number of times and recursion generalizes iteration to automatically manage the history of the operations performed during iteration. The structure of control flow for conventional (classical) computation is well-understood. In the case of *quantum* computation, control flow is still subject to debate. This paper proposes a working notion of quantum control in closed quantum systems, shedding new light on the problem, and clarifying several of the previous concerns.

*Quantum Computation.* A good starting point for understanding quantum computation is to consider classical circuits over *bits* but replacing the bits with *qubits*, which are intuitively superpositions of bits weighed by complex number amplitudes. Computationally, a qubit is an abstract data type governed by the laws of quantum physics, whose values are normalized vectors of complex numbers in the Hilbert space **C**<sup>2</sup> (modulo a global phase). By choosing an orthonormal basis, say the classical bits tt and ff, a qubit can be regarded as a complex linear combination, α tt + β ff, where α and β are complex numbers such that |α| <sup>2</sup> <sup>+</sup>|β<sup>|</sup> <sup>2</sup> = 1. This generalizes naturally to multiple qubits: the state of a system of n qubits is a vector in the Hilbert space (**C**<sup>2</sup>)<sup>⊗</sup>n.

The operations one can perform on a quantum memory are of two kinds: quantum gates and measurements. Quantum gates are unitary operations that are "purely quantum" in the sense that they modify the quantum memory without giving any feedback to the outside world: the quantum memory is viewed as a *closed system*. A customary graphical representation for these operations is the *quantum circuit*, akin to conventional boolean circuits: wires represent qubits while boxes represents operations to perform on them. One of the peculiar aspects of quantum computation is that the state of a qubit is non-duplicable [1], a result known as the *no-cloning theorem*. A corollary is that a quantum circuit is a very simple kind of circuit: wires neither split nor merge.

Measurement is a fundamentally different kind of operation: it queries the state of the quantum memory and returns a classical result. Measuring the state of a quantum bit is a probabilistic and destructive operation: it produces a classical answer with a probability that depends on the amplitudes α, β in the state of the qubit while projecting this state onto tt or ff, based on the result.

For a more detailed introduction to quantum computation, we refer the reader to recent textbooks (e.g., [2]).

*Control Flow in Quantum Computation.* In the context of quantum programming languages, there is a well-understood notion of control flow: the so-called *classical control flow*. A quantum program can be seen as the construction, manipulation and evaluation of quantum circuits [3,4]. In this setting, circuits are simply considered as special kinds of data without much computational content, and programs are ruled by regular classical control.

One can however consider the circuit being manipulated as a program in its own right: a particular sequence of execution on the quantum memory is then seen as a closed system. One can then try to derive a notion of *quantum control* [5], with "quantum tests" and "quantum loops". Quantum tests are a bit tricky to perform [5,6] but they essentially correspond to well-understood controlled operations. The situation with quantum loops is more subtle [6,7]. First, a hypothetical quantum loop *must* terminate. Indeed, a non-terminating quantum loop would entail an infinite quantum circuit, and this concept has so far no meaning. Second, the interaction of quantum loops with measurement is problematic: it is known that the canonical model of *open* quantum computation based on superoperators [8,9] is incompatible with such quantum control [6]. Finally, the mathematical operator corresponding to a quantum loop would need to act on an infinite-dimensional Hilbert space and the question of mixing programming languages with infinitary Hilbert spaces is still an unresolved issue.

*Our Contribution.* In this paper, we offer a novel solution to the question of quantum control: we define a purely quantum language, inspired by Theseus [10], featuring tests and fixpoints in the presence of lists. More precisely, we propose (1) a typed, reversible language, extensible to linear combinations of terms, with a reduction strategy akin to algebraic lambda-calculi [11–13]; (2) a model for the language based on unitary operators over infinite-dimensional Hilbert spaces, simplifying the Fock space model of Ying [7]. This model captures lists, tests, and structurally recursive fixpoints. We therefore settle two longstanding issues. (1) We offer a solution to the problem of quantum loops, with the use of *terminating*, *structurally recursive*, *purely quantum* fixpoints. We dodge previously noted concerns (e.g., [6]) by staying in the closed quantum setting and answer the problem of the external system of quantum "coins" [7] with the use of lists. (2) By using a linear language based on patterns and clauses, we give an extensible framework for reconciling algebraic calculi with quantum computation [11,12,16].

In the remainder of the paper, we first introduce the key idea underlying our classical reversible language in a simple first-order setting. We then generalize the setting to allow second-order functions, recursive types (e.g., lists), and fixpoints. After illustrating the expressiveness of this classical language, we adapt it to the quantum domain and give a semantics to the resulting quantum language in infinite-dimensional Hilbert spaces. Technical material that would interrupt the flow or that is somewhat complementary has been relegated to an extended version of the paper [17].

### **2 Pattern-Matching Isomorphisms**

The most elementary control structure in a programming language is the ability to conditionally execute one of several possible code fragments. Expressing such an abstraction using predicates and nested **if**-expressions makes it difficult for both humans and compilers to reason about the control flow structure. Instead, in modern functional languages, this control flow paradigm is elegantly expressed using *pattern-matching*. This approach yields code that is not only more concise and readable but also enables the compiler to easily verify two crucial properties: (i) non-overlapping patterns and (ii) exhaustive coverage of a datatype using a collection of patterns. Indeed most compilers for functional languages perform these checks, warning the user when they are violated. At a more fundamental level, e.g., in type theories and proof assistants, these properties are actually necessary for correct reasoning about programs. Our first insight, explained in this section, is that these properties, perhaps surprisingly, are sufficient to produce a simple and intuitive first-order reversible programming language.

```
f :: Either Int Int -> a
f (Left 0) = undefined
f (Left (n+1)) = undefined
f (Right n) = undefined
   Fig. 1. A skeleton
                             g :: (Bool,Int) -> a
                             g (False,n) = undefined
                             g (True,0) = undefined
                             g (True,n+1) = undefined
                            Fig. 2. Another skeleton
                                                         h :: Either Int Int <-> (Bool,Int)
                                                         h (Left 0) = (True,0)
                                                         h (Left (n+1)) = (False,n)
                                                         h (Right n) = (True,n+1)
                                                             Fig. 3. An isomorphism
```
#### **2.1 An Example**

We start with a small illustrative example, written in a Haskell-like syntax. Figure 1 gives the skeleton of a function f that accepts a value of type Either Int Int; the patterns on the left-hand side exhaustively cover every possible incoming value and are non-overlapping. Similarly, Fig. 2 gives the skeleton for a function g that accepts a value of type (Bool,Int); again the patterns on the left-hand side exhaustively cover every possible incoming value and are non-overlapping. Now we claim that since the types Either Int Int and (Bool,Int) are isomorphic, we can combine the patterns of f and g into *symmetric pattern-matching clauses* to produce a reversible function between the types Either Int Int and (Bool,Int). Figure 3 gives one such function; there, we suggestively use <-> to indicate that the function can be executed in either direction. This reversible function is obtained by simply combining the non-overlapping exhaustive patterns on the two sides of a clause. In order to be well-formed in either direction, these clauses are subject to the constraint that each variable occurring on one side must occur exactly once on the other side (and with the same type). Thus it is acceptable to swap the second and third right-hand sides of h but not the first and second ones.

#### **2.2 Terms and Types**

We present a formalization of the ideas presented above using a simple typed first-order reversible language. The language is two-layered. The first layer contains values, which also play the role of patterns. These values are constructed from variables ranged over x and the introduction forms for the finite types a, b constructed from the unit type and sums and products of types. The second layer contains collections of pattern-matching clauses that denote isomorphisms of type a ↔ b. Computations are chained applications of isomorphisms to values:

$$\begin{array}{lclclcl} \text{(Value types)} & a,b & ::= & \mathbb{1} \mid \ a \oplus b \mid \ a \otimes b\\ \text{(Iso types)} & T & ::= & a \leftrightarrow b\\ \text{(Values)} & v & ::= \text{ ()} \mid x \mid \text{ inj}\_l \, v \mid \text{ inj}\_r \, v \mid \langle v\_1, v\_2 \rangle\\ \text{(Isos)} & & \omega & ::= \{ \mid \ v\_1 \leftrightarrow v\_1' \mid \, v\_2 \leftrightarrow v\_2' \ldots \text{ } \} \\ \text{(Terms)} & t & ::= & v \mid \omega t \end{array}$$

The typing rules are defined using two judgments: <sup>Δ</sup> v <sup>v</sup> : <sup>a</sup> for typing values (or *patterns*) and terms; and ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> for typing collections of pattern-matching clauses denoting an isomorphism. As it is customary, we write <sup>a</sup><sup>1</sup> <sup>⊗</sup> <sup>a</sup><sup>2</sup> ⊗···⊗ <sup>a</sup>n for ((a<sup>1</sup> <sup>⊗</sup> <sup>a</sup>2) ⊗···⊗ <sup>a</sup>n), and similarly x1, x2,...,xn for x1, x2,...,xn.

The typing rules for values are the expected ones. The only subtlety is the fact that they are linear: because values act as patterns, we forbid the repetition of variables. A typing context <sup>Δ</sup> is a set of typed variables <sup>x</sup><sup>1</sup> : <sup>a</sup>1,...,xn : <sup>a</sup>n. A value typing judgment is valid if it can be derived from the following rules:

v () : **<sup>1</sup>**, <sup>x</sup> : <sup>a</sup> v <sup>x</sup> : a, <sup>Δ</sup><sup>1</sup> v <sup>v</sup><sup>1</sup> : a Δ<sup>2</sup> v <sup>v</sup><sup>2</sup> : <sup>b</sup> <sup>Δ</sup>1, Δ<sup>2</sup> v v1, v2 : <sup>a</sup> <sup>⊗</sup> b. <sup>Δ</sup> v <sup>v</sup> : <sup>a</sup> <sup>Δ</sup> <sup>v</sup> injl <sup>v</sup> : <sup>a</sup> <sup>⊕</sup> b, <sup>Δ</sup> v <sup>v</sup> : <sup>b</sup> <sup>Δ</sup> <sup>v</sup> injr <sup>v</sup> : <sup>a</sup> <sup>⊕</sup> b,

The typing rule for term construction is simple and forces the term to be closed:

$$\frac{\vdash\_v t : a \quad \vdash\_\omega \omega : a \leftrightarrow b}{\vdash\_v \omega \; t : b}$$

The most interesting type rule is the one for isomorphisms. We present the rule and then explain it in detail:

$$\begin{array}{llll} \Delta\_1 \vdash\_v v\_1 : a & \Delta\_n \vdash\_v v\_n : a & \forall i \neq j, v\_i \bot v\_j \quad \text{dim}(a) = n \\ \Delta\_1 \vdash\_v v'\_1 : b & \dots & \Delta\_n \vdash\_v v'\_n : b & \forall i \neq j, v'\_i \bot v'\_j \quad \text{dim}(b) = n \\ \hline \hline \vdash\_\omega \{ \ \mid \ v\_1 \leftrightarrow v'\_1 \mid \ v\_2 \leftrightarrow v'\_2 \; \dots \ \} : a \leftrightarrow b, \end{array} \tag{1}$$

The rule relies on two auxiliary conditions as motivated in the beginning of the section. These conditions are (i) the orthogonality judgment v⊥v that formalizes that patterns must be *non-overlapping* and (ii) the condition dim(a) = n which formalizes that patterns are *exhaustive*. The rules for deriving orthogonality of values or patterns are:

$$\begin{array}{c} \begin{array}{c} \overline{\mathtt{inj}\_{l}\ v\_{1}\perp\mathtt{inj}\_{r} v\_{2}} \\ \overline{\mathtt{inj}\_{l} v\_{1}\perp\mathtt{inj}\_{l} v\_{2}} \end{array} \quad \begin{array}{c} \overline{\mathtt{inj}\_{r} v\_{1}\perp\mathtt{inj}\_{r} v\_{2}} \\ \overline{\mathtt{inj}\_{r}\perp\mathtt{var}\_{l}} \end{array} \quad \begin{array}{c} \overline{\mathtt{inj}\_{r} v\_{1}\perp\mathtt{inj}\_{l} v\_{2}} \\ \overline{v\_{1}\perp v\_{2}} \end{array} \quad \begin{array}{c} \overline{v\_{1}\perp\mathtt{inj}\_{l} v\_{2}} \\ \overline{v\_{1}\perp v\_{2}} \end{array} \quad \begin{array}{c} v\_{1}\perp v\_{2} \\ \overline{\langle v\_{1},v\rangle\perp\langle v'\_{2}\rangle} \end{array} \quad \begin{array}{c} v\_{1}\perp v\_{2} \\ \overline{\langle v\_{1},v\rangle\perp\langle v\_{2},v'\rangle} \end{array} \end{array}$$

The idea is simply that the left and right injections are disjoint subspaces of values. To characterize that a set of patterns is exhaustive, we associate a *dimension* with each type. For finite types, this is just the number of elements in the type and is inductively defined as follows: dim(**1**) = 1; dim(a ⊕ b) = dim(a) + dim(b); and dim(a ⊗ b) = dim(a)· dim(b). For a given type a, if a set of non-overlapping clauses has cardinality dim(a), it is exhaustive. Conversely, any set of exhaustive clauses for a type a either has cardinality dim(a) or can be extended to an equivalent exhaustive set of clauses of cardinality dim(a).

#### **2.3 Semantics**

We equip our language with a simple operational semantics on terms, using the natural notion of matching. To formally define it, we first introduce the notion of variable assignation, or valuation, which is a partial map from a finite set of variables (the support) to a set of values. We denote the matching of a value w against a pattern v and its associated valuation σ as σ[v] = w and define it as follows:

$$\begin{array}{llll}\hline & \sigma = \{x \mapsto v\} & \sigma[v] = w\\ \hline \sigma[\boldsymbol{\eta}] = \boldsymbol{\eta} & \sigma[\boldsymbol{\text{in}} \mathbf{j}\_l \ \boldsymbol{v}] = \mathbf{in} \mathbf{j}\_l \ \boldsymbol{w} & \sigma[\mathbf{in} \mathbf{j}\_r \ \boldsymbol{v}] = \mathbf{in} \mathbf{j}\_r \ \boldsymbol{w} \\\\ \hline & \sigma\_2[v\_1] = w\_1 & \sigma\_1[v\_2] = w\_2 & \text{supp}(\sigma\_1) \cap \text{supp}(\sigma\_2) = \emptyset & \sigma = \sigma\_1 \cup \sigma\_2 \\ \hline & \sigma[\langle v\_1, v\_2 \rangle] = \langle w\_1, w\_2 \rangle & & \\ \hline \end{array}$$

If σ is a valuation whose support contains the variables of v, we write σ(v) for the value where the variables of v have been replaced with the corresponding values in σ.

Given these definitions, we can define the reduction relation on terms. The redex { | v<sup>1</sup> ↔ v <sup>1</sup> | v<sup>2</sup> ↔ v <sup>2</sup> ... } v reduces to σ(v i) whenever <sup>σ</sup>[vi] = <sup>v</sup> i. Because of the conditions on patterns, a matching pattern exists by exhaustivity of coverage, and this pattern is unique by the non-overlapping condition. Congruence holds: ω t → ω t whenever t → t . As usual, we write s → t to say that s rewrites in one step to t and s →<sup>∗</sup> t to say that s rewrites to t in 0 or more steps.

Because of the conditions set on patterns, the rewrite system is deterministic. More interestingly, we can swap the two sides of all pattern-matching clauses in an isomorphism ω to get ω−<sup>1</sup>. The execution of ω−<sup>1</sup> is the reverse execution of <sup>ω</sup> in the sense that <sup>ω</sup>−<sup>1</sup>(ω t) <sup>→</sup><sup>∗</sup> <sup>t</sup> and <sup>ω</sup>(ω−<sup>1</sup> <sup>t</sup> ) →<sup>∗</sup> t .

### **3 Second-Order Functions, Lists, and Recursion**

The first-order reversible language from the previous section embodies symmetric-pattern matching clauses as its core notion of control. Its expressiveness is limited, however. We now show that it is possible to extend it to have more in common with a conventional functional language. To that end, we extend the language with the ability to parametrically manipulate isomorphisms, with a recursive type (lists), and with recursion.

#### **3.1 Terms and Types**

Formally, the language is now defined as follows.


We use variables f to span a set of iso-variables and variables x to span a set of term-variables. We extend the layer of isos so that it can be parameterized by a fixed number of other isos, i.e., we now allow higher-order manipulation of isos using λf.ω, iso-variables, and applications. Isos can now be used inside the definition of other isos with a let-notation. These let-constructs are however restricted to products of term-variables: they essentially serve as syntactic sugar for composition of isos. An extended value is then a value where some of its free variables are substituted with the result of the application of one or several isos. Given an extended value e, we define its *bottom value*, denoted with Val(e) as the value "at the end" of the let-chain: Val(v) = v, and Val(let p = ωp in e) = Val(e). The orthogonality of extended values is simply the orthogonality of their bottom value.

As usual, the type of lists [a] of elements of type a is a recursive type and is equivalent to **<sup>1</sup>** <sup>⊕</sup> (<sup>a</sup> <sup>×</sup> [a]). We build the value [] (empty list) as injl () and the term <sup>t</sup><sup>1</sup> : <sup>t</sup><sup>2</sup> (cons of <sup>t</sup><sup>1</sup> and <sup>t</sup>2) as injr t1, t2. In addition, to take full advantage of recursive datatypes, it is natural to consider recursion. Modulo a termination guarantee it is possible to add a fixpoint to the language: we extend isos with the fixpoint constructor μf.ω. Some reversible languages allow infinite loops and must work with partial isomorphisms instead. Since we plan on using our language as a foundation for a quantum language we insist of termination.

Since the language features two kinds of variables, there are typing contexts (written Δ) consisting of base-level typed variables of the form x : a, and typing context (written Ψ) consisting of typed iso-variables of the form f : T. As terms and values contain both base-level and iso-variables, one needs two typing contexts. Typing judgments are therefore written respectively as <sup>Δ</sup>; <sup>Ψ</sup> v <sup>t</sup> : <sup>a</sup>. The updated rules for (v) are found in Table 1. As the only possible free variables in isos are iso-variables, their typing judgments only need one context and are written as <sup>Ψ</sup> ω <sup>ω</sup> : <sup>T</sup>.

The rules for typing derivations of isos are in Table 2. It is worthwhile mentioning that isos are treated in a usual, non-linear way: this is the purpose of the typing context separation. The intuition is that an iso is the description of a closed computation with respect to inputs: remark that isos cannot accept **Table 1.** Typing rules for terms and values

∅; Ψ <sup>v</sup> () : **1** x : a; Ψ <sup>v</sup> x : a Δ; Ψ <sup>v</sup> t : a Δ; Ψ <sup>v</sup> inj<sup>l</sup> t : a ⊕ b Δ; Ψ <sup>v</sup> t : b Δ; Ψ <sup>v</sup> inj<sup>r</sup> t : a ⊕ b Δ1; Ψ <sup>v</sup> t<sup>1</sup> : a Δ2; Ψ <sup>v</sup> t<sup>2</sup> : b Δ1, Δ2; Ψ <sup>v</sup> t1, t2 : a ⊗ b Ψ <sup>ω</sup> ω : a ↔ b Δ; Ψ <sup>v</sup> t : a Δ; Ψ <sup>v</sup> ω t : b Δ; Ψ <sup>v</sup> t<sup>1</sup> : a ⊗ b Δ, x : a, y : b; Ψ <sup>v</sup> t<sup>2</sup> : c Δ; Ψ <sup>v</sup> let x, y = t<sup>1</sup> in t<sup>2</sup> : c

**Table 2.** Typing rules for isos

$$\begin{array}{c} \Delta\_{1}; \Psi \vdash\_{v} v\_{1}: a \quad \dots \quad \Delta\_{n}; \Psi \vdash\_{v} v\_{n}: a \quad \text{OD}\_{a} \{v\_{1}, \dots, v\_{n}\} \\ \hline \Delta\_{1}; \Psi \vdash\_{v} e\_{1}: b \quad \dots \quad \Delta\_{n}; \Psi \vdash\_{v} e\_{n}: b \quad \text{OD}\_{b}^{ext} \{e\_{1}, \dots, e\_{n}\} \\ \hline \Psi \vdash\_{\omega} \{ \ \mid \ \upsilon\_{1} \leftrightarrow e\_{1} \ \mid \ \upsilon\_{2} \leftrightarrow e\_{2} \ \dots \ \} : a \leftrightarrow b. \\\\ \frac{\Psi, \ f: a \leftrightarrow b \vdash\_{\omega} \omega: T}{\Psi \vdash\_{\omega} \lambda f \, \omega \colon (a \leftrightarrow b) \rightarrow T} \quad \frac{\Psi, \ f: T \vdash\_{\omega} f: T}{\Psi, \ f: T \vdash\_{\omega} f: T} \\\\ \frac{\Psi \vdash\_{\omega} \omega\_{1}: (a \leftrightarrow b) \rightarrow T \; \Psi \vdash\_{\omega} \omega\_{2}: a \leftrightarrow b}{\Psi \vdash\_{\omega} \omega\_{1} \omega\_{2}: T} \\ \hline \Psi, \ f: a \leftrightarrow b \vdash\_{\omega} \omega: (a\_{1} \leftrightarrow b\_{1}) \rightarrow \cdots \rightarrow (a\_{n} \leftrightarrow b\_{n}) \rightarrow (a \leftrightarrow b) \\ \hline \mu f. \omega \text{ terminates in any finite context} \end{array}$$

Ψ <sup>ω</sup> μf.ω : (a<sup>1</sup> ↔ b1) →···→ (a<sup>n</sup> ↔ bn) → (a ↔ b)

value-types. As computations, they can be erased or duplicated without issues. On the other hand, value-types still need to be treated linearly.

In the typing rule for recursion, the condition "μf.ω terminates in any finite context" formally refers to the following requirement. A well-typed fixpoint μf.ω of type <sup>Ψ</sup> ω μf.ω : (a<sup>1</sup> <sup>↔</sup> <sup>b</sup>1) → ··· → (an <sup>↔</sup> <sup>b</sup>n) <sup>→</sup> (<sup>a</sup> <sup>↔</sup> <sup>b</sup>) is *terminating in a* <sup>0</sup>*-context* if for all closed isos <sup>ω</sup>i : <sup>a</sup>i <sup>↔</sup> <sup>b</sup>i not using fixpoints and for every closed value <sup>v</sup> of type <sup>a</sup>, the term ((μf.ω)ω<sup>1</sup> ...ωn)<sup>v</sup> terminates. We say that the fixpoint is *terminating in an* (<sup>n</sup> + 1)*-context* if for all closed isos <sup>ω</sup>i : <sup>a</sup>i <sup>↔</sup> <sup>b</sup>i terminating in n-contexts, and for every closed value v of type a, the term ((μf.ω)ω<sup>1</sup> ...ωn)<sup>v</sup> terminates. Finally, we say that the fixpoint is *terminating in any finitary context* if for all n it is terminating in any n-context.

With the addition of lists, the non-overlapping and exhaustivity conditions need to be modified. The main problem is that we can no longer define the dimension of types using natural numbers: [a] is in essence an infinite sum, and would have an "infinite" dimension. Instead, we combine the two conditions into the concept of *orthogonal decomposition*. Formally, given a type a, we say that a set <sup>S</sup> of patterns is an *orthogonal decomposition*, written ODa(S), when these patterns are pairwise orthogonal and when they cover the whole type. We

#### **Table 3.** Reduction rules

$$\frac{t\_1 \to t\_2}{C[t\_1] \to C[t\_2]} \text{ Comg} \qquad \frac{\sigma[p] = v\_1}{\mathbf{1} \bullet t \, p = v\_1 \text{ in } t\_2 \to \sigma(t\_2)} \text{ LetE}$$

$$\frac{\sigma[v\_i] = v}{\{ \ | \ | \ v\_1 \leftrightarrow t\_1 \ | \ \dots \ | \ v\_n \leftrightarrow t\_n \ | \ v \to \sigma(t\_i) \}} \text{ IsoApp} \qquad \frac{\text{IsoApp}}{(\lambda f \omega) \ \omega\_2 \to \omega[\omega\_2/f]} \text{ HIsoApp}$$

$$\frac{\Psi, f: a \leftrightarrow b \vdash\_{\omega} \omega \ : (a\_1 \leftrightarrow b\_1) \to \cdots \to (a\_n \leftrightarrow b\_n) \to (a \leftrightarrow b)}{\mu f \, \omega \to \lambda f\_1 \dots f\_n.(\omega[[(\mu f \omega) f\_1 \dots f\_n)/f]) f\_1 \dots f\_n} \text{ IsoRec}$$

formally define ODa(S) as follows. For all types <sup>a</sup>, ODa{x} is valid. For the unit type, OD**1**{()} is valid. If ODa(S) and ODb(T), then

$$\begin{aligned} \text{OD}\_{a \oplus b}(\{\text{inj}\_l \ v \mid v \in S\} \cup \{\text{inj}\_r \ v \mid v \in T\})\\ \text{and} \quad \text{OD}\_{a \otimes b}\{\langle v\_1, v\_2 \rangle \mid v\_1 \in S, \ v\_2 \in T, \text{ FV}(v\_1) \cap \text{FV}(v\_2) = \emptyset\}, \end{aligned}$$

where FV(t) stands for the set of free value-variables in t. We then extend the notion of orthogonal decomposition to extended values as follows. If S is a set of extended values, OD*ext* a (S) is true whenever ODa{Val(e) <sup>|</sup> <sup>e</sup> <sup>∈</sup> <sup>S</sup>}. With this new characterization, the typing rule of iso in Eq. 1 still holds, and then can be re-written using this notion of orthogonal decomposition as shown in Table 2.

#### **3.2 Semantics**

In Table 3 we present the reduction rules for the reversible language. We assume that the reduction relation applies to well-typed terms. In the rules, the notation <sup>C</sup>[−] stands for an *applicative context*, and is defined as: <sup>C</sup>[−] ::= [−] <sup>|</sup> injl <sup>C</sup>[−] <sup>|</sup> injr <sup>C</sup>[−] <sup>|</sup> (C[−])<sup>ω</sup> | {· · · } (C[−])<sup>|</sup> let <sup>p</sup> <sup>=</sup> <sup>C</sup>[−] in <sup>t</sup><sup>2</sup> | C[−], v| v, C[−].

The inversion of isos is still possible but more subtle than in the first-order case. We define an inversion operation (−)−<sup>1</sup> on iso types with, (<sup>a</sup> <sup>↔</sup> <sup>b</sup>)−<sup>1</sup> := (<sup>b</sup> <sup>↔</sup> <sup>a</sup>), ((<sup>a</sup> <sup>↔</sup> <sup>b</sup>) <sup>→</sup> <sup>T</sup>)−<sup>1</sup> := ((<sup>b</sup> <sup>↔</sup> <sup>a</sup>) <sup>→</sup> (<sup>T</sup> <sup>−</sup><sup>1</sup>)). Inversion of isos is defined as follows. For fixpoints, (μf.ω)−<sup>1</sup> = μf.(ω−1). For variables, (f)−<sup>1</sup> := f. For applications, (ω<sup>1</sup> ω2)−<sup>1</sup> := (ω1)−<sup>1</sup> (ω2)−<sup>1</sup>. For abstraction, (λf.ω)−<sup>1</sup> := λf.(ω−<sup>1</sup>). Finally, clauses are inverted as follows:

$$\begin{pmatrix} v\_1 \leftrightarrow \mathtt{1} \mathfrak{et} \, p\_1 = \omega\_1 \, p'\_1 \, \mathtt{in} \\ \cdots \\ \mathtt{1} \mathfrak{et} \, p\_n = \omega\_n \, p'\_n \, \mathtt{in} \, v'\_1 \end{pmatrix}^{-1} \coloneqq \begin{pmatrix} v'\_1 \leftrightarrow \mathtt{1} \mathfrak{et} \, p'\_n = \omega\_n^{-1} \, p\_n \, \mathtt{in} \\ \cdots \\ \mathtt{1} \mathfrak{et} \, p'\_1 = \omega\_1^{-1} \, p\_1 \, \mathtt{in} \, v\_1 \end{pmatrix}.$$

Note that (−)−<sup>1</sup> only inverts first-order arrows (↔), not second-order arrows (→). This is reflected by the fact that iso-variable are non-linear while valuevariables are. This is due to the clear separation of the two layers of the language.

The rewriting system satisfies the usual properties for well-typed terms: it is terminating, well-typed closed terms have a unique normal value-form, and it preserves typing.

**Theorem 1.** *The inversion operation is well-typed, in the sense that if* f<sup>1</sup> : <sup>a</sup><sup>1</sup> <sup>↔</sup> <sup>b</sup>1,...,fn : <sup>a</sup>n <sup>↔</sup> <sup>b</sup>n ω <sup>ω</sup> : <sup>T</sup> *then we also have* <sup>f</sup><sup>1</sup> : <sup>b</sup><sup>1</sup> <sup>↔</sup> <sup>a</sup>1,...,fn : <sup>b</sup>n <sup>↔</sup> <sup>a</sup>n ω <sup>ω</sup>−<sup>1</sup> : <sup>T</sup> <sup>−</sup><sup>1</sup>*.*

Thanks to the fact that the language is terminating, we also recover the operational result of Sect. 2.3.

**Theorem 2.** *Consider a well-typed, closed iso* ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup>*, and suppose that* v <sup>v</sup> : <sup>a</sup> *and that* v <sup>w</sup> : <sup>b</sup>*, then* <sup>ω</sup>−<sup>1</sup>(ω v) <sup>→</sup><sup>∗</sup> <sup>v</sup> *and* <sup>ω</sup>(ω−<sup>1</sup> <sup>w</sup>) <sup>→</sup><sup>∗</sup> <sup>w</sup>*.*

### **4 Examples**

In the previous sections, we developed a novel classical reversible language with a familiar syntax based on pattern-matching. The language includes a limited notion of higher-order functions and (terminating) recursive functions. We illustrate the expressiveness of the language with a few examples and motivate the changes and extensions needed to adapt the language to the quantum domain.

We encode booleans as follows: **<sup>B</sup>** <sup>=</sup> **<sup>1</sup>** <sup>⊕</sup> **<sup>1</sup>**, tt <sup>=</sup> injl (), and ff <sup>=</sup> injr (). One of the easiest function to define is not : **<sup>B</sup>** <sup>↔</sup> **<sup>B</sup>** which flips a boolean. The controlled-not gate which flips the second bit when the first is true can also be expressed:

$$\mathtt{mot} : \mathbb{B} \leftrightarrow \mathbb{B} = \left( \begin{array}{c} \mathtt{ff} \ \leftrightarrow \ \mathtt{tt} \\ \mathtt{tt} \ \leftrightarrow \ \mathtt{ff} \end{array} \right), \quad \mathtt{cnot} : \mathbb{B} \otimes \mathbb{B} \leftrightarrow \mathbb{B} \otimes \mathbb{B} = \left( \begin{array}{c} \langle \mathtt{ff}, x \rangle \\ \langle \mathtt{tt}, \mathtt{ff} \rangle \ \leftrightarrow \ \langle \mathtt{tt}, \mathtt{tt} \rangle \\ \langle \mathtt{tt}, \mathtt{tt} \rangle \ \leftrightarrow \ \langle \mathtt{tt}, \mathtt{ff} \rangle \end{array} \right).$$

All the patterns in the previous two functions are orthogonal decompositions which guarantee reversibility as desired.

By using the abstraction facilities in the language, we can define higherorder operations that build complex reversible functions from simpler ones. For example, we can define a conditional expression parameterized by the functions used in the two branches:

$$\begin{array}{l} \mathsf{if} : (a \leftrightarrow b) \xrightarrow{} (a \leftrightarrow b) \xrightarrow{} (\mathbb{B} \otimes a \leftrightarrow \mathbb{B} \otimes b) \\ \mathsf{if} = \lambda g.\lambda h.\left(\begin{array}{l} \langle \mathtt{t}, x \rangle \iff \mathtt{1} \mathsf{et} \, y = g \ x \ \mathtt{in} \ \langle \mathtt{t}, y \rangle \\ \langle \mathtt{f}, x \rangle \leftrightarrow \ \mathtt{1} \mathsf{et} \, y = h \ x \ \mathtt{in} \ \langle \mathtt{f}, y \rangle \end{array} \right) \end{array}$$

Using if and the obvious definition for the identity function id, we can define ctrl :: (<sup>a</sup> <sup>↔</sup> <sup>a</sup>) <sup>→</sup> (**<sup>B</sup>** <sup>⊗</sup> <sup>a</sup> <sup>↔</sup> **<sup>B</sup>** <sup>⊗</sup> <sup>a</sup>) as ctrl <sup>f</sup> <sup>=</sup> if <sup>f</sup> id and recover an alternative definition of cnot as ctrl not. We can then define the controlledcontrolled-not gate (aka the Toffoli gate) by writing ctrl cnot. We can even iterate this construction using fixpoints to produce an n-controlled-not function that takes a list of n control bits and a target bit and flips the target bit iff all the control bits are tt:

$$\begin{array}{l} \mathsf{cnot} \ast : ([\mathbb{B}] \otimes \mathbb{B}) \leftrightarrow ([\mathbb{B}] \otimes \mathbb{B})\\ \mathsf{cnot} \ast = \mu f. \begin{pmatrix} [],tb) \mapsto \mathsf{1et} \, tb' = \mathsf{not} \, tb \ \mathrm{in} \, \langle ], tb' \rangle\\ \langle \mathbf{ff} : cbs, tb \rangle \leftrightarrow \langle \mathbf{ff} : cbs, tb \rangle\\ \langle \mathsf{tt} : cbs, tb \rangle \leftrightarrow \mathsf{1et} \, \langle cbs', tb' \rangle = f \, \langle cbs, tb \rangle \ \mathrm{in} \, \langle \mathsf{tt} : cbs', tb' \rangle \end{pmatrix} \end{array}$$

The language is also expressible enough to write conventional recursive (and higher-order) programs. We illustrate this expressiveness using the usual map operation and an accumulating variant mapAccu:

map : (a <sup>↔</sup> b) <sup>→</sup> ([a] <sup>↔</sup> [b]) λg.μf. ⎛ ⎝ [] ↔ [] h : t <sup>↔</sup> let x <sup>=</sup> g h in let y <sup>=</sup> f t in x : y ⎞ ⎠ , mapAccu : (a <sup>⊗</sup> b <sup>↔</sup> a <sup>⊗</sup> c) <sup>→</sup> (a <sup>⊗</sup> [b] <sup>↔</sup> a <sup>⊗</sup> [c]) λg.μf. ⎛ ⎜⎜⎝ x, [] ↔x, [] x, (h : t)<sup>↔</sup> let y, h- <sup>=</sup> g x, h in let z, t- <sup>=</sup> f y, t in z, (h- : t - ) ⎞ ⎟⎟⎠ .

The three examples cnot\*, map and mapAccu uses fixpoints which are clearly terminating in any finite context. Indeed, the functions are structurally recursive. A formal definition of this notion for the reversible language is as follows.

**Definition 1.** Define a *structurally recursive type* as a type of the form [a] <sup>⊗</sup> <sup>b</sup>1⊗...⊗bn. Let <sup>ω</sup> <sup>=</sup> {vi <sup>↔</sup> <sup>e</sup>i <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>I</sup>} be an iso such that <sup>f</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>c</sup> where a is a structurally recursive type. We say that μf.ω is *structurally recursive* provided that for each <sup>i</sup> <sup>∈</sup> <sup>I</sup>, the value <sup>v</sup>i is either of the form [], p1,...pn or of the form <sup>h</sup> : t, p1,...pn. In the former case, <sup>e</sup>i does not contain <sup>f</sup> as a free variable. In the latter case, <sup>e</sup>i is of the form <sup>C</sup>[ft, p 1,...,p n] where <sup>C</sup> is a context of the form <sup>C</sup>[−] ::= [−] <sup>|</sup> let <sup>p</sup> <sup>=</sup> <sup>C</sup>[−] in <sup>t</sup> <sup>|</sup> let <sup>p</sup> <sup>=</sup> <sup>t</sup> in <sup>C</sup>[−].

This definition will be critical for quantum loops in the next section.

### **5 From Reversible Isos to Quantum Control**

In the language presented so far, an iso ω : a ↔ b describes a bijection between the set <sup>B</sup>a of closed values of type <sup>a</sup> and the set <sup>B</sup>b of closed values of type <sup>b</sup>. If one regards <sup>B</sup>a and <sup>B</sup>b as the basis elements of some vector space a and b, the iso ω becomes a 0/1 matrix.

As an example, consider an iso ω defined using three clauses of the form { | v<sup>1</sup> ↔ v <sup>1</sup> | v<sup>2</sup> ↔ v <sup>2</sup> | v<sup>3</sup> ↔ v <sup>3</sup> }. From the exhaustivity and non-overlapping conditions derives the fact that the space a can be split into the direct sum of the three subspaces a<sup>v</sup>*<sup>i</sup>* (<sup>i</sup> = 1, <sup>2</sup>, 3) generated by <sup>v</sup>i. Similarly, b is split into the direct sum of the subspaces bv- *i* generated by v i. One can therefore represent ω as the matrix <sup>ω</sup> in Fig. 4: The "1" in each column <sup>v</sup>i indicates to which subspace b an element of a<sup>v</sup>*<sup>i</sup>* is sent to.

v*j* In Sect. 2.2 we discussed the fact that <sup>v</sup>i⊥vj when <sup>i</sup> <sup>=</sup> <sup>j</sup>. This notation hints at the fact that a and b could be seen as Hilbert spaces and the mapping ω as a unitary map from a to b. The purpose of this section is to extend and formalize precisely the correspondence between isos and unitary maps.

The definition of clauses is extended following this idea of seeing isos as unitaries, and not only bijections on basis elements of the

$$\left\{ \begin{array}{l} \mid \ v\_{1} \leftrightarrow a\_{11}v'\_{\clubsuit} + a\_{21}v'\_{\clubsuit} + a\_{31}v'\_{\clubsuit} \\ \mid \ v\_{2} \leftrightarrow a\_{12}v'\_{\clubsuit} + a\_{22}v'\_{\clubsuit} + a\_{23}v'\_{\clubsuit} \\ \mid \ v\_{3} \leftrightarrow a\_{31}v'\_{1} + a\_{32}v'\_{2} + a\_{33}v'\_{3} \end{array} \right\}$$

input space. We therefore essentially propose to generalize the clauses to complex, linear combinations of values on the right-hand-side, such as shown on the left, with the side conditions on

that the matrix of Fig. 5 is unitary. We define in Sect. 5.1 how this extends to second-order.

#### **5.1 Extending the Language to Linear Combinations of Terms**

The quantum unitary language extends the reversible language from the previous section by closing extended values and terms under complex, finite linear combinations. For example, if v<sup>1</sup> and v<sup>2</sup> are values and α and β are complex numbers, α · v<sup>1</sup> + β · v<sup>2</sup> is now an extended value.

Several approaches exist for performing such an extension. One can update the reduction strategy to be able to reduce these sums and scalar multiplications to normal forms [12,18], or one can instead consider terms modulo the usual algebraic equalities [13,18]: this is the strategy we follow for this paper.

When extending a language to linear combination of terms in a naive way, this added structure might generate inconsistencies in the presence of unconstrained fixpoints [12,13,18]. The weak condition on termination we imposed on fixpoints in the classical language was enough to guarantee reversibility. With the presence of linear combinations, we want the much stronger guarantee of unitarity. For this reason, we instead impose fixpoints to be *structurally recursive*.

The quantum unitary language is defined by allowing sums of terms and values and multiplications by complex numbers: if t and t are terms, so is α · t + t . Terms and values are taken modulo the equational theory of modules. We furthermore consider the value and term constructs −, −, let <sup>p</sup> <sup>=</sup> <sup>−</sup> in <sup>−</sup>, injl (−), injr (−) distributive over sum and scalar multiplication. We do *not* however take iso-constructions as distributive over sum and scalar multiplication: { | v<sup>1</sup> ↔ αv<sup>2</sup> + βv<sup>3</sup> } is *not* the same thing as α { | v<sup>1</sup> ↔ v<sup>2</sup> } + β { | v<sup>1</sup> ↔ v<sup>3</sup> }. This is in the spirit of Lineal [11,12].

The typing rules for terms and extended values are updated as follows. We only allow linear combinations of terms and values of the same type and of the same free variables. Fixpoints are now required to be *structurally recursive*, as introduced in Definition 1. Finally, an iso is now not only performing an "identity" as in Fig. 4 but a true unitary operation:

$$\frac{\begin{array}{l} \Delta\_{1}; \Psi \vdash\_{v} v\_{1}: a & \dots & \Delta\_{n}; \Psi \vdash\_{v} v\_{n}: a \\ \Delta\_{1}; \Psi \vdash\_{v} e\_{1}: b & \dots & \Delta\_{n}; \Psi \vdash\_{v} e\_{n}: b \\ \text{OD}\_{a} \{v\_{1}, \dots, v\_{n}\} & \text{OD}\_{b}^{ext} \{e\_{1}, \dots, e\_{n}\} & \begin{pmatrix} a\_{11} & \dots & a\_{1n} \\ \vdots & & \vdots \\ a\_{n1} & \dots & a\_{nn} \end{pmatrix} \text{ is unitary} \end{array}}{\begin{array}{l} \text{OD}\_{a} \{v\_{1}, \dots, v\_{n}\} \qquad \begin{pmatrix} v\_{1} \dashv \cdots \dashv a\_{1n} \\ \vdots \\ a\_{n1} \dashv \cdots \dashv a\_{nn} \end{pmatrix} \text{ is unitary} \end{array}}$$

The reduction relation is updated in a way that it remains deterministic in this extended setting. It is split into two parts: the reduction of pure terms, i.e. non-extended terms or values, and linear combinations thereof. Pure terms and values reduce using the reduction rules found in Table 3. We do not extend applicative contexts to linear combinations. For linear combinations of pure terms, we simply ask that *all* pure terms that are not normal forms in the combination are reduced. This makes the extended reduction relation deterministic.

*Example 1.* This allows one to define an iso behaving as the Hadamard gate, or a slightly more complex iso conditionally applying another iso, whose behavior as a matrix is shown in Fig. 6.

$$\begin{array}{llll} \mathsf{Had}: \mathbb{B} \leftrightarrow \mathbb{B} & \mathsf{Gate}: \mathbb{B} \otimes \mathbb{B} \leftrightarrow \mathbb{B} \otimes \mathbb{B} \\ \begin{pmatrix} \mathtt{tt} \leftrightarrow \frac{1}{\sqrt{2}} \mathtt{tt} + \frac{1}{\sqrt{2}} \mathtt{ff} \\ \mathtt{ff} \leftrightarrow \frac{1}{\sqrt{2}} \mathtt{tt} - \frac{1}{\sqrt{2}} \mathtt{ff} \end{pmatrix}, \quad \begin{pmatrix} \langle \mathtt{tt}, x \rangle \leftrightarrow \ \mathtt{let} \ x \ \mathtt{in} \ \frac{1}{\sqrt{2}} \langle \mathtt{tt}, y \rangle + \frac{1}{\sqrt{2}} \langle \mathtt{ff}, y \rangle \\ \langle \mathtt{ff}, x \rangle \leftrightarrow \ \mathtt{let} \ y = \mathtt{Id} \ x \ \mathtt{in} \ \frac{1}{\sqrt{2}} \langle \mathtt{tt}, y \rangle - \frac{1}{\sqrt{2}} \langle \mathtt{ff}, y \rangle \end{pmatrix}. \end{array}$$

With this extension to linear combinations of terms, one can characterize normal forms as follows.

**Lemma 1 (Structure of the Normal Forms).** *Let* <sup>ω</sup> *be such that* ω <sup>ω</sup> : a <sup>↔</sup> <sup>b</sup>*. For all closed values* <sup>v</sup> *of type* <sup>a</sup>*, the term* ω v *rewrites to a normal form* N i=1 <sup>α</sup><sup>i</sup> · <sup>w</sup><sup>i</sup> *where* N < <sup>∞</sup>*, each* <sup>w</sup><sup>i</sup> *is a closed value of type* <sup>b</sup> *and* i <sup>|</sup>αi<sup>|</sup> = 1*.*

*Proof.* The fact that ω v converges to a normal form is a corollary of the fact that we impose structural recursion on fixpoints. The property of the structure of the normal form is then proven by induction on the maximal number of steps it takes to reach it. It uses the restriction on the introduction of sums in the typing rule for clauses in isos and the determinism of the reduction.

In the classical setting, isos describe bijections between sets of closed values: it was proven by considering the behavior of an iso against its inverse. In the presence of linear combinations of terms, we claim that isos describe more than bijections: they describe unitary maps. In the next section, we discuss how types can be understood as Hilbert spaces (Sect. 5.2) and isos as unitary maps (Sects. 5.3 and 5.4).

#### **5.2 Modeling Types as Hilbert Spaces**

By allowing complex linear combinations of terms, closed normal forms of finite types such as **B** or **B** ⊗ **B** can be regarded as complex vector spaces with basis consisting of closed values. For example, **B** is associated with -**<sup>B</sup>** <sup>=</sup> {<sup>α</sup> · tt <sup>+</sup> <sup>β</sup> · ff <sup>|</sup> α, β <sup>∈</sup> **<sup>C</sup>**} ≡ **<sup>C</sup>**<sup>2</sup>. We can consider this space as a complex Hilbert space where the scalar product is defined on basis elements in the obvious way: v|v = 1 and v|w = 0 if <sup>v</sup> <sup>=</sup> <sup>w</sup>. The map Had of Example <sup>1</sup> is then effectively a unitary map on the space -**B**.

The problem comes from lists: the type [**1**] is inhabited by an infinite number of closed values: [], [()], [(),()], [(),(),()],. . . To account for this case, we need to consider infinitely dimensional complex Hilbert spaces. In general, a complex Hilbert space [19] is a complex vector space endowed with a scalar product that is complete with respect the distance induced by the scalar product. The completeness requirement implies for example that the infinite linear combination []+ <sup>1</sup> <sup>2</sup> ·[()]+ <sup>1</sup> <sup>4</sup> [(),()]+ <sup>1</sup> <sup>8</sup> [(),(),()]+··· needs to be an element of -[**B**]. To account for these limit elements, we propose to use the standard [19] Hilbert space <sup>2</sup> of infinite sequences.

**Definition 2.** Let <sup>a</sup> be a value type. As before, we write <sup>B</sup>a for the set of closed values of type <sup>a</sup>, that is, <sup>B</sup>a <sup>=</sup> {<sup>v</sup> | v <sup>v</sup> : <sup>a</sup>}. The *span of a* is defined as the Hilbert space <sup>a</sup> <sup>=</sup> <sup>2</sup>(Ba) consisting of sequences (φv)v∈B*<sup>a</sup>* of complex numbers indexed by <sup>B</sup>a such that <sup>v</sup>∈B*<sup>a</sup>* <sup>|</sup>φv<sup>|</sup> <sup>2</sup> <sup>&</sup>lt; <sup>∞</sup>. The scalar product on this space is defined as (φv)v∈B*<sup>a</sup>* <sup>|</sup>(ψv)v∈B*<sup>a</sup>* <sup>=</sup> <sup>v</sup>∈B*<sup>a</sup>* <sup>φ</sup>vψv.

We shall use the following conventions. A closed value v of a is identified with the sequence (δv,v- )v-∈B*<sup>a</sup>* where <sup>δ</sup>v,v = 1 and <sup>δ</sup>v,v- = 0 if v = v . An element (φv)v∈B*<sup>a</sup>* of a is also written as the infinite, formal sum <sup>v</sup>∈B*<sup>a</sup>* <sup>φ</sup><sup>v</sup> · <sup>v</sup>.

#### **5.3 Modeling Isos as Bounded Linear Maps**

We can now define what is the linear map associated to an iso.

**Definition 3.** For each closed iso ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> we define ω as the linear map from a to b sending the closed value v : a to the normal form of ω v : b under the rewrite system.

In general, the fact that ω is well-defined is not trivial. If it is formally stated in Theorem 3, we can first try to understand what could go wrong. The problem comes from the fact that the space a is not finite in general. Consider the iso map Had : [**B**] <sup>↔</sup> [**B**]. Any closed value <sup>v</sup> : [**B**] is a list and the term (map Had) v rewrites to a normal form consisting of a linear combination of lists. Denote the linear combination associated to <sup>v</sup> with <sup>L</sup>v. An element of -[**B**] is a sequence <sup>φ</sup> = (φv)v∈B[**B**] . From Definition 3, the map ω sends the element φ ∈ -[**B**] to <sup>v</sup>∈B[**B**] <sup>φ</sup><sup>v</sup> ·Lv. This is an infinite sum of sums of complex numbers: we need to make sure that it is well-defined: this is the purpose of the next result. Because of the constraints on the language, we can even show that it is a *bounded* linear map.

In the case of the map map Had, we can understand why it works as follows. The space -[**B**] can be decomposed as the direct sum <sup>∞</sup> i=0 <sup>E</sup>i, where <sup>E</sup><sup>i</sup> is generated with all the lists in **B** of size i. The map map Had is acting locally on each finitely-dimensional subspace <sup>E</sup>i. It is therefore well-defined. Because of the unitarity constraint on the linear combinations appearing in Had, the operation performed by map Had sends elements of norm 1 to elements of norm 1. This idea can be formalized and yield the following theorem.

**Theorem 3.** *For each closed iso* ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> *the linear map* ω : a → b *is well-defined and bounded.*

#### **5.4 Modeling Isos as Unitary Maps**

In this section, we show that not only closed isos can be modeled as bounded linear maps, but that these linear maps are in fact unitary maps. The problem comes from fixpoints. We first consider the case of isos written without fixpoints, and then the case with fixpoints.

*Without recursion.* The case without recursion is relatively easy to treat, as the linear map modeling the iso can be compositionally constructed out of elementary unitary maps.

**Theorem 4.** *Given a closed iso* ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> *defined without the use of recursion, the linear map* π : a → b *is unitary.*

The proof of the theorem relies on the fact that to each closed iso ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> one can associate an operationally equivalent iso ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> that does not use iso-variables nor lambda-abstractions. We can define a notion of *depth* of an iso as the number of nested isos. The proof is done by induction on this depth of the iso ω: it is possible to construct a unitary map for ω using the unitary maps for each <sup>ω</sup>ij as elementary building blocks.

As an illustration, the semantics of Gate of Example 1 is given in Fig. 6.

*Isos with structural recursion.* When considering fixpoints, we cannot rely anymore on this finite compositional construction: the space a cannot anymore be regarded as a *finite* sum of subspaces described by each clause.

We therefore need to rely on the formal definition of unitary maps in general, infinite Hilbert spaces. On top of being bounded linear, a map ω : a → b is unitary if (1) it preserves the scalar product: ω(e)|ω(f) = e|f for all e and f in a and (2) it is surjective.

**Theorem 5.** *Given a closed iso* ω <sup>ω</sup> : <sup>a</sup> <sup>↔</sup> <sup>b</sup> *that can use structural recursion, the linear map* π : a → b *is unitary.*

The proof uses the idea highlighted in Sect. 5.4: for a structurally recursive iso of type [a] ⊗ b ↔ c, the Hilbert space -[a] ⊗ b can be split into a canonical decomposition <sup>E</sup><sup>0</sup> <sup>⊕</sup>E<sup>1</sup> <sup>⊕</sup>E<sup>2</sup> ⊕··· , where <sup>E</sup>i contains only the values of the form [x<sup>1</sup> ...xi], y, containing the lists of size <sup>i</sup>. On each <sup>E</sup>i, the iso is equivalent to an iso without structural recursion.

### **6 Conclusion**

In this paper, we proposed a reversible language amenable to quantum superpositions of values. The language features a weak form of higher-order that is nonetheless expressible enough to get interesting maps such as generalized Toffoli operators. We sketched how this language effectively encodes bijections in the classical case and unitary operations in the quantum case. It would be interesting to see how this relates to join inverse categories [14,15].

In the vectorial extension of the language we have the same control as in the classical, reversible language. Tests are captured by clauses, and naturally yield quantum tests: this is similar to what can be found in QML [5,6], yet more general since the QML approach is restricted to if-then-else constructs. The novel aspect of quantum control that we are able to capture here is a notion of *quantum loops*. These loops were believed to be hard, if not impossible. What makes it work in our approach is the fact that we are firmly within a closed quantum system, without measurements. This makes it possible to only consider unitary maps and frees us from the L¨ower order on positive matrices [6]. As we restrict fixpoints to structural recursion, valid isos are regular enough to capture unitarity. Ying [7] also proposes a framework for quantum while-loops that is similar in spirit to our approach at the level of denotations: in his approach the control part of the loops is modeled using an external systems of "coins" which, in our case, correspond to conventional lists. Reducing the manipulation of this external coin system to iteration on lists allowed us to give a simple operational semantics for the language.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Quantitative Models

# **The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes**

St´ephane Le Roux1(B) and Guillermo A. P´erez<sup>2</sup>

<sup>1</sup> Department of Mathematics, Technische Universit¨at Darmstadt, Darmstadt, Germany leroux@mathematik.tu-darmstadt.de <sup>2</sup> Departement d'Informatique, Universit´e libre de Bruxelles, Brussels, Belgium

gperezme@ulb.ac.be

**Abstract.** We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state *q* is never worse than a state *p* if the maximal probability of reaching the target set of states from *p* is at most the same value from *q*, regardless of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub-optimal states can be removed. We show that the natural decision problem associated to computing the NWR is coNP-complete. Finally, we extend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR.

### **1 Introduction**

Markov decision processes (MDPs) are a useful model for decision-making in the presence of a stochastic environment. They are used in several fields, including robotics, automated control, economics, manufacturing and in particular planning [20], model-based reinforcement learning [22], and formal verification [1]. We elaborate on the use of MDPs and the need for graph-based reductions thereof in verification and reinforcement learning applications below.

Several verification problems for MDPs reduce to reachability [1,5]. For instance, MDPs can be model checked against linear-time objectives (expressed in, say, LTL) by constructing an omega-automaton recognizing the set of runs that satisfy the objective and considering the product of the automaton with the original MDP [6]. In this product MDP, accepting end components—a generalization of strongly connected components—are identified and selected as target components. The question of maximizing the probability that the MDP behaviours satisfy the linear-time objective is thus reduced to maximizing the probability of reaching the target components.

The maximal reachability probability is computable in polynomial time by reduction to linear programming [1,6]. In practice, however, most model checkers c The Author(s) 2018

use value iteration to compute this value [9,17]. The worst-case time complexity of value iteration is pseudo-polynomial. Hence, when implementing model checkers it is usual for a graph-based pre-processing step to remove as many unnecessary states and transitions as possible while preserving the maximal reachability probability. Well-known reductions include the identification of extremal-probability states and maximal end components [1,5]. The intended outcome of this pre-processing step is a reduced amount of transition probability values that need to be considered when computing the number of iterations required by value iteration.

The main idea behind MDP reduction heuristics is to identify subsets of states from which the maximal probability of reaching the target set of states is the same. Such states are in fact redundant and can be "collapsed". Figure 1 depicts an MDP with actions and probabilities omitted for clarity. From p and q there are strategies to ensure that s is reached with probability 1. The same holds for t. For instance, from p, to get to t almost surely, one plays to go to the distribution directly below q; from q, to the distribution above q. Since from the state p, there is no strategy to ensure that q is reached with probability 1, p and q do not form an *end component*. In fact, to the best of our knowledge, no known MDP reduction heuristic captures this example (i.e., recognizes that p and q have the same maximal reachability probability for all possible values of the transition probabilities).

**Fig. 1.** An MDP with states depicted as circles and distributions as squares. The maximal reachability probability values from *p* and *q* are the same since, from both, one can enforce to reach *s* with probability 1, or *t* with probability 1, using different strategies.

In reinforcement learning the actual probabilities labelling the transitions of an MDP are not assumed to be known in advance. Thus, they have to be estimated by experimenting with different actions in different states and collecting statistics about the observed outcomes [14]. In order for the statistics to be good approximations, the number of experiments has to be high enough. In particular, when the approximations are required to be *probably approximately correct* [23], the necessary and sufficient number of experiments is pseudo-polynomial [13]. Furthermore, the expected number of steps before reaching a particular state even once may already be exponential (even if all the probabilities are fixed). The fact that an excessive amount of experiments is required is a known drawback of reinforcement learning [15,19].

A natural and key question to ask in this context is whether the maximal reachability probability does indeed depend on the actual value of the probability labelling a particular transition of the MDP. If this is not the case, then it need not be learnt. One natural way to remove transition probabilities which do not affect the maximal reachability value is to apply model checking MDP reduction techniques.

*Contributions and Structure of the Paper.* We view the directed graph underlying an MDP as a directed bipartite graph. Vertices in this graph are controlled by players *Protagonist* and *Nature*. Nature is only allowed to choose full-support probability distributions for each one of her vertices, thus instantiating an MDP from the graph; Protagonist has strategies just as he would in an MDP. Hence, we consider infinite families of MDPs with the same support. In the game played between Protagonist and Nature, and for vertices u and v, we are interested in knowing whether the maximal reachability probability from u is never (in any of the MDPs with the game as its underlying directed graph) worse than the same value from v.

In Sect. 2 we give the required definitions. We formalize the *never-worse relation* in Sect. 3. We also show that we can "collapse" sets of equivalent vertices with respect to the NWR (Theorem 1) and remove sub-optimal edges according to the NWR (Theorem 2). Finally, we also argue that the NWR generalizes most known heuristics to reduce MDP size before applying linear programming or value iteration. Then, in Sect. 4 we give a graph-based characterization of the relation (Theorem 3), which in turn gives us a coNP upper bound on its complexity. A matching lower bound is presented in Sect. 5 (Theorem 4). To conclude, we recall and extend an iterative algorithm to efficiently (in polynomial time) under-approximate the never-worse relation from [2].

*Previous and Related Work.* Reductions for MDP model checking were considered in [5,7]. From the reductions studied in both papers, extremal-probability states, essential states, and end components are computable using only graphbased algorithms. In [3], learning-based techniques are proposed to obtain approximations of the maximal reachability probability in MDPs. Their algorithms, however, do rely on the actual probability values of the MDP.

This work is also related to the widely studied model of interval MDPs, where the transition probabilities are given as intervals meant to model the uncertainty of the numerical values. Numberless MDPs [11] are a particular case of the latter in which values are only known to be zero or non-zero. In the context of numberless MDPs, a special case of the question we study can be simply rephrased as the comparison of the maximal reachability values of two given states.

In [2] a preliminary version of the iterative algorithm we give in Sect. 6 was described, implemented, and shown to be efficient in practice. Proposition 1 was first stated therein. In contrast with [2], we focus chiefly on characterizing the never-worse relation and determining its computational complexity.

### **2 Preliminaries**

We use set-theoretic notation to indicate whether a letter b ∈ Σ *occurs* in a word α = a<sup>0</sup> ...a<sup>k</sup> ∈ Σ∗, i.e. b ∈ α if and only if b = a<sup>i</sup> for some 0 ≤ i ≤ k.

Consider a directed graph G = (V,E) and a vertex u ∈ V . We write uE for the set of *successors* of u. That is to say, uE := {v ∈ V | (u, v) ∈ E}. We say that a path π = u<sup>0</sup> ...u<sup>k</sup> ∈ V <sup>∗</sup> in G *visits* a vertex v if v ∈ π. We also say that π is a v–T path, for T ⊆ V , if u<sup>0</sup> = v and u<sup>k</sup> ∈ T.

#### **2.1 Stochastic Models**

Let S be a finite set. We denote by D(S) the set of all *(rational) probabilistic distributions* on <sup>S</sup>, i.e. the set of all functions <sup>f</sup> : <sup>S</sup> <sup>→</sup> <sup>Q</sup>≥<sup>0</sup> such that - <sup>s</sup>∈<sup>S</sup> <sup>f</sup>(s) = 1. A probabilistic distribution <sup>f</sup> <sup>∈</sup> <sup>D</sup>(S) has *full support* if <sup>f</sup>(s) <sup>&</sup>gt; 0 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>.

**Definition 1 (Markov chains).** *A* Markov chain C *is a tuple* (Q, δ) *where* Q *is a finite set of states and* <sup>δ</sup> *is a probabilistic transition function* <sup>δ</sup> : <sup>Q</sup> <sup>→</sup> <sup>D</sup>(Q)*.*

A *run* of a Markov chain is a finite non-empty word = p<sup>0</sup> ...p<sup>n</sup> over Q. We say *reaches* q if q = p<sup>i</sup> for some 0 ≤ i ≤ n. The *probability of the run* is <sup>0</sup>≤i<n <sup>δ</sup>(pi, pi+1).

Let T ⊆ Q be a set of states. The *probability of (eventually) reaching* T in <sup>C</sup> from <sup>q</sup>0, which will be denoted by <sup>P</sup><sup>q</sup><sup>0</sup> <sup>C</sup> [♦T], is the measure of the runs of <sup>C</sup> that start at q<sup>0</sup> and reach T. For convenience, let us first define the *probability of staying in states from* <sup>S</sup> <sup>⊆</sup> <sup>Q</sup> *until* <sup>T</sup> *is reached* <sup>1</sup>, written <sup>P</sup><sup>q</sup><sup>0</sup> <sup>C</sup> [<sup>S</sup> <sup>U</sup> <sup>T</sup>], as 1 if q<sup>0</sup> ∈ T and otherwise

$$\sum \left\{ \prod\_{0 \le i < n} \delta(q\_i, q\_{i+1}) \left| q\_0 \dots q\_n \in (S \nmid T)^\* T \text{ for } n \ge 1 \right. \right\}.$$

We then define P<sup>q</sup><sup>0</sup> <sup>C</sup> [♦T] := <sup>P</sup><sup>q</sup><sup>0</sup> <sup>C</sup> [<sup>Q</sup> <sup>U</sup> <sup>T</sup>].

When all runs from q<sup>0</sup> to T reach some set U ⊆ Q before, the probability of reaching T can be decomposed into a finite sum as in the lemma below.

**Lemma 1.** *Consider a Markov chain* C = (Q, δ)*, sets of states* U, T ⊆ Q*, and a state* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> \ <sup>U</sup>*. If* <sup>P</sup><sup>q</sup><sup>0</sup> <sup>C</sup> [(<sup>Q</sup> \ <sup>U</sup>) <sup>U</sup> <sup>T</sup>]=0*, then*

$$\mathbb{P}\_{\mathcal{C}}^{q\_0}[\lozenge T] = \sum\_{u \in U} \mathbb{P}\_{\mathcal{C}}^{q\_0}[(Q \nmid U) \bullet u] \mathbb{P}\_{\mathcal{C}}^u[\lozenge T].$$

**Definition 2 (Markov decision processes).** *A* (finite, discrete-time) Markov decision process M*, MDP for short, is a tuple* (Q, A, δ, T) *where* Q *is a finite set of states,* <sup>A</sup> *a finite set of actions,* <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>D</sup>(Q) *<sup>a</sup>* probabilistic transition function*, and* T ⊆ Q *a set of* target *states.*

For convenience, we write δ(q|p, a) instead of δ(p, a)(q).

<sup>1</sup> *S* U *T* should be read as "*S until T*" and not understood as a set union.

**Definition 3 (Strategies).** *A* (memoryless deterministic) strategy σ *in an MDP* M = (Q, A, δ, T) *is a function* σ : Q → A*.*

Note that we have deliberately defined only memoryless deterministic strategies. This is at no loss of generality since, in this work, we focus on maximizing the probability of reaching a set of states. It is known that for this type of objective, memoryless deterministic strategies suffice [18].

*From MDPs to Chains.* An MDP M = (Q, A, δ, T) and a strategy σ induce the Markov chain <sup>M</sup><sup>σ</sup> = (Q, μ) where <sup>μ</sup>(q) = <sup>δ</sup>(q, σ(q)) for all <sup>q</sup> <sup>∈</sup> <sup>Q</sup>.

**Fig. 2.** On the left we have an MDP with actions {*a, b*}. On the right we have the Markov chain induced by the left MDP and the strategy {*p* -→ *a, q* -→ *b*}.

*Example 1.* Figure 2 depicts an MDP on the left. Circles represent states; doublecircles, target states; and squares, distributions. The labels on arrows from states to distributions are actions; those on arrows from distributions to states, probabilities.

Consider the strategy σ that plays from p the action a and from q the action b, i.e. σ(p) = a and σ(q) = b. The Markov chain on the right is the chain induced by σ and the MDP on the left. Note that we no longer have action labels.

The probability of reaching a target state from q under σ is easily seen to be 3/4. In other words, if we write M for the MDP and T for the set of target states then P<sup>q</sup> <sup>M</sup><sup>σ</sup> [♦T] = <sup>3</sup> 4 .

#### **2.2 Reachability Games Against Nature**

We will speak about families of MDPs whose probabilistic transition functions have the same support. To do so, we abstract away the probabilities and focus on a game played on a graph. That is, given an MDP M = (Q, A, δ, T) we consider its *underlying directed graph* G<sup>M</sup> = (V,E) where V := Q∪(Q×A) and E := {(q,q, a ) ∈ Q × (Q × A)}∪{(p, a , q) | δ(q|p, a) > 0}. In GM, *Nature* controls the vertices Q × A. We formalize the game and the *arena* it is played on below.

**Definition 4 (Target arena).** *A* target arena A *is a tuple* (V,V<sup>P</sup> ,E,T) *such that* (V<sup>P</sup> , V<sup>N</sup> := V \V<sup>P</sup> , E) *is a bipartite directed graph,* T ⊆ V<sup>P</sup> *is a set of* target *vertices, and* uE <sup>=</sup> <sup>∅</sup> *for all* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>N</sup> *.*

Informally, there are two agents in a target arena: *Nature*, who controls the vertices in V<sup>N</sup> , and *Protagonist*, who controls the vertices in V<sup>P</sup> .

*From Arenas to MDPs.* A target arena A = (V,V<sup>P</sup> ,E,T) together with a family of probability distributions <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))u∈V<sup>N</sup> induce an MDP. Formally, let A<sup>μ</sup> be the MDP (Q, A, δ, T) where Q = V<sup>P</sup> {⊥}, A = V<sup>N</sup> , δ(q|p, a) is μa(q) if (p, a),(a, q) ∈ E and 0 otherwise, for all p ∈ V<sup>P</sup> ∪ {⊥} and a ∈ A we have δ(⊥|p, a) = 1 if (p, a) ∈ E.

*The Value of a Vertex.* Consider a target arena A = (V,V<sup>P</sup> ,E,T) and a vertex v ∈ V<sup>P</sup> . We define its *(maximal reachability probability) value* with respect to a family of full-support probability distributions μ as Val<sup>μ</sup>(v):= max<sup>σ</sup> P<sup>v</sup> A<sup>σ</sup> <sup>μ</sup> [♦T]. For <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>N</sup> we set Val<sup>μ</sup>(u) := -{μu(v)Val<sup>μ</sup>(v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> uE}.

### **3 The Never-Worse Relation**

We are now in a position to define the relation that we study in this work. Let us fix a target arena A = (V,V<sup>P</sup> ,E,T).

**Definition 5 (The never-worse relation (NWR)).** *A subset* W ⊆ V *of vertices is* never worse *than a vertex* v ∈ V *, written* v W*, if and only if*

$$\forall \mu = (\mu\_u \in \mathbb{D}(uE))\_{u \in V\_N}, \exists w \in W : \text{Val}^\mu(v) \le \text{Val}^\mu(w).$$

*where all the* μ<sup>u</sup> *have full support. We write* v ∼ w *if* v {w} *and* w {v}*.*

It should be clear from the definition that ∼ is an equivalence relation. For u ∈ V let us denote by ˜u the set of vertices that are ∼-equivalent and belong to the same owner, i.e. ˜u is {v ∈ V<sup>P</sup> | v ∼ u} if u ∈ V<sup>P</sup> and {v ∈ V<sup>N</sup> | v ∼ u} otherwise.

**Fig. 3.** Two target arenas with *T* = {*fin*} are shown. Round vertices are elements from *V<sup>P</sup>* ; square vertices, from *V<sup>N</sup>* . In the left target arena we have that *p* {*q*} and *q* {*p*} since any path from either vertex visits *t* before *T*—see Lemma 1. In the right target arena we have that *t* {*p*}—see Proposition 1.

*Example 2.* Consider the left target arena depicted in Fig. 3. Using Lemma 1, it is easy to show that neither p nor q is ever worse than the other since t is visited before *fin* by all paths starting from p or q.

The literature contains various heuristics which consist in computing sets of states and "collapsing" them to reduce the size of the MDP without affecting the maximal reachability probability of the remaining states. We now show that we can collapse equivalence classes and, further, remove sub-optimal distributions using the NWR.

#### **3.1 The Usefulness of the NWR**

We will now formalize the idea of "collapsing" equivalent vertices with respect to the NWR. For convenience, we will also remove self-loops while doing so.

Consider a target arena A = (V,V<sup>P</sup> ,E,T). We denote by A/<sup>∼</sup> its ∼*-quotient*. That is, A/<sup>∼</sup> is the target arena (S, S<sup>P</sup> , R, U) where S<sup>P</sup> = {v˜ | ∃v ∈ V<sup>P</sup> }, S = {v˜ | ∃v ∈ V<sup>N</sup> } ∪ S<sup>P</sup> , U = {t ˜ | ∃<sup>t</sup> <sup>∈</sup> <sup>T</sup>}, and

$$\begin{aligned} R &= \{ (\tilde{u}, \tilde{v}) \mid \exists (u, v) \in (V\_P \times V\_N) \cap E : vE \mid \tilde{u} \neq \mathcal{Q} \}, \\ &\cup \{ (\tilde{u}, \tilde{v}) \mid \exists (u, v) \in (V\_N \times V\_P) \cap E \}. \end{aligned}$$

For a family <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> of full-support distributions we denote by <sup>μ</sup>/<sup>∼</sup> the family <sup>ν</sup> = (νu˜ <sup>∈</sup> <sup>D</sup>(˜uR))u˜∈S<sup>N</sup> defined as follows. For all ˜<sup>u</sup> <sup>∈</sup> <sup>S</sup><sup>N</sup> and all v˜ ∈ uR˜ we have νu˜(˜v) = - <sup>w</sup>∈v˜ <sup>μ</sup>u(w), where <sup>u</sup> is any element of ˜u.

The following property of the ∼-quotient follows from the fact that all the vertices in ˜v have the same maximal probability of reaching the target vertices.

**Theorem 1.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T)*. For all families* μ = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *of full-support probability distributions and all* <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>P</sup> *we have*

$$\max\_{\sigma} \mathbb{P}^{\nu}\_{\mathcal{A}^{\sigma}\_{\mu}}[\Diamond T] = \max\_{\sigma'} \mathbb{P}^{\tilde{v}}\_{\mathcal{B}^{\sigma'}\_{\nu}}[\Diamond U],$$

*where* B = A/∼*,* ν = μ/∼*, and* U = {t ˜ | ∃<sup>t</sup> <sup>∈</sup> <sup>T</sup>}*.*

We can further remove edges that lead to sub-optimal Nature vertices. When this is done after ∼-quotienting the maximal reachability probabilities are preserved.

**Theorem 2.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *such that* A/<sup>∼</sup> = A*. For all families* <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *of full-support probability distributions, for all* (w, x) ∈ E ∩(V<sup>P</sup> × V<sup>N</sup> ) *such that* x(wE \ {x})*, and all* v ∈ V<sup>P</sup> *we have*

$$\max\_{\sigma} \mathbb{P}^{\upsilon}\_{\mathcal{A}^{\sigma}\_{\mu}}[\Diamond T] = \max\_{\sigma'} \mathbb{P}^{\upsilon}\_{\mathcal{B}^{\sigma'}\_{\mu}}[\Diamond T],$$

*where* B = (V,V<sup>P</sup> , E \ {(w, x)}, T)*.*

#### **3.2 Known Efficiently-Computable Special Cases**

We now recall the definitions of the set of extremal-probability states, end components, and essential states. Then, we observe that for all these sets of states their maximal probability reachability coincide and their definitions are independent of the probabilities labelling the transitions of the MDP. Hence, they are subsets of the set of the equivalence classes induced by ∼.

**Extremal-Probability States.** The set of *extremal-probability states* of an MDP M = (Q, A, δ, T) consists of the set of states with maximal probability reachability 0 and 1. Both sets can be computed in polynomial time [1,4]. We give below a game-based definition of both sets inspired by the classical polynomialtime algorithm to compute them (see, e.g., [1]). Let us fix a target arena A = (V,V<sup>P</sup> ,E,T) for the sequel.

For a set T ⊆ V , let us write **Z**<sup>T</sup> := {v ∈ V | T is not reachable from v}.

*(Almost-Surely Winning) Strategies.* A strategy for Protagonist in a target arena is a function σ : V<sup>P</sup> → V<sup>N</sup> . We then say that a path v<sup>0</sup> ...v<sup>n</sup> ∈ V <sup>∗</sup> is *consistent with* σ if v<sup>i</sup> ∈ V<sup>P</sup> =⇒ σ(vi) = vi+1 for all 0 ≤ i<n. Let **Reach**(v0, σ) denote the set of vertices reachable from v<sup>0</sup> under σ, i.e. **Reach**(v0, σ) := {v<sup>k</sup> | v<sup>0</sup> ...v<sup>k</sup> is a path consistent with σ}.

We say that a strategy σ for Protagonist is *almost-surely winning from* u<sup>0</sup> ∈ V *to* T ⊆ V<sup>P</sup> if, after modifying the arena to make all t ∈ T into sinks, for all v<sup>0</sup> ∈ **Reach**(u0, σ) we have **Reach**(v0, σ)∩T = ∅. We denote the set of all such strategies by **Win**<sup>v</sup><sup>0</sup> T .

The following properties regarding almost-surely winning strategies in a target arena follow from the correctness of the graph-based algorithm used to compute extremal-probability states in an MDP [1, Lemma 10.108].

**Lemma 2 (From** [1]**).** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T)*. For all families* <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *of full-support probability distributions, for all* v ∈ V<sup>P</sup> *the following hold.*

*(i)* max<sup>σ</sup> P<sup>v</sup> A<sup>σ</sup> <sup>μ</sup> [♦T]=0 ⇐⇒ v ∈ **Z**<sup>T</sup> *(ii)* <sup>∀</sup><sup>σ</sup> : <sup>σ</sup> <sup>∈</sup> **Win**<sup>v</sup> <sup>T</sup> ⇐⇒ <sup>P</sup><sup>v</sup> A<sup>σ</sup> <sup>μ</sup> [♦T]=1

**End Components.** Let us consider an MDP M = (Q, A, δ, T). A set S ⊆ Q of states is an *end component* in M if for all pairs of states p, q ∈ S there exists a strategy σ such that P<sup>p</sup> <sup>M</sup><sup>σ</sup> [<sup>S</sup> <sup>U</sup> <sup>q</sup>] = 1.

*Example 3.* Let us consider the MDP shown on the left in Fig. 2. The set {p, q} is an end component since, by playing a from both states, one can ensure to reach either state from the other with probability 1.

It follows immediately from the definition of end component that the maximal probability of reaching T from states in the same end component is the same.

**Lemma 3.** *Let* S ⊆ Q *be an end component in* M*. For all* p, q ∈ S *we have that* max<sup>σ</sup> P<sup>p</sup> <sup>M</sup><sup>σ</sup> [♦T] = max<sup>σ</sup> <sup>P</sup><sup>q</sup> <sup>M</sup><sup>σ</sup> [♦T]*.*

We say an end component is *maximal* if it is maximal with respect to set inclusion. Furthermore, from the definition of end components in MDPs and Lemma 2 it follows that we can lift the notion of end component to target arenas. More precisely, a set S ⊆ V<sup>P</sup> is an end component in A if and only if for some family of full-support probability distributions μ we have that S is an end component in A<sup>μ</sup> (if and only if for all μ the set S is an end component in Aμ-).

The set of all maximal end components of a target arena can be computed in polynomial time using an algorithm based on the strongly connected components of the graph [1,8].

**Essential States.** Consider a target arena A = (V,V<sup>P</sup> ,E,T) and let be the smallest relation satisfying the following. For all u ∈ V<sup>P</sup> we have u u. For all u0, v ∈ V<sup>P</sup> \ **Z**<sup>T</sup> such that u<sup>0</sup> = v we have u<sup>0</sup> v if for all paths u0u1u<sup>2</sup> we have that u<sup>2</sup> v and there is at least one such path. Intuitively, u v holds whenever all paths starting from u reach v. In [7], the maximal vertices according to are called *essential states*<sup>2</sup>.

**Lemma 4 (From** [7]**).** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T)*. For all families* <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *of full-support probability distributions, for all* <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>P</sup> *and all essential states* <sup>w</sup>*, if* <sup>v</sup> <sup>w</sup> *then* max<sup>σ</sup> <sup>P</sup><sup>v</sup> A<sup>σ</sup> <sup>μ</sup> [♦T] = max<sup>σ</sup>- P<sup>w</sup> Aσ- μ [♦T]*.*

Note that, in the left arena in Fig. 3, p t does not hold since there is a cycle between p and q which does not visit t.

It was also shown in [7] that the relation is computable in polynomial time.

#### **4 Graph-Based Characterization of the NWR**

In this section we give a characterization of the NWR that is reminiscent of the topological-based value iteration proposed in [5]. The main intuition behind our characterization is as follows. If v W does not hold, then for all 0 <ε< 1 there is some family μ of full-support distributions such that Val<sup>μ</sup>(v) is at least <sup>1</sup> <sup>−</sup> <sup>ε</sup>, while Val<sup>μ</sup>(w) is at most <sup>ε</sup> for all <sup>w</sup> <sup>∈</sup> <sup>W</sup>. In turn, this must mean that there is a path from v to T which can be assigned a high probability by μ while, from W, all paths go with high probability to **Z**<sup>T</sup> .

We capture the idea of separating a "good" v–T path from all paths starting from W by using partitioning of V into layers S<sup>i</sup> ⊆ V . Intuitively, we would like it to be easy to construct a family μ of probability distributions such that from all vertices in Si+1 all paths going to vertices outside of Si+1 end up, with high probability, in lower layers, i.e. some S<sup>k</sup> with k<i. A formal definition follows.

**Definition 6 (Drift partition and vertices).** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *and a partition* (Si)<sup>0</sup>≤i≤<sup>k</sup> *of* <sup>V</sup> *. For all* <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>*, let* <sup>S</sup><sup>+</sup> <sup>i</sup> := ∪i<jS<sup>j</sup> *and* S<sup>−</sup> <sup>i</sup> := ∪j<iS<sup>j</sup> *, and let* D<sup>i</sup> := {v ∈ S<sup>i</sup> ∩ V<sup>N</sup> | vE ∩ S<sup>−</sup> <sup>i</sup> <sup>=</sup> <sup>∅</sup>}*. We define the set* D := ∪<sup>0</sup><i<kD<sup>i</sup> *of* drift vertices*. The partition is called a* drift partition *if the following hold.*


<sup>2</sup> This is not the usual notion of essential states from classical Markov chain theory.

Using drift partitions, we can now formalize our characterization of the negation of the NWR.

**Theorem 3.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T)*, a non-empty set of vertices* W ⊆ V *, and a vertex* v ∈ V *. The following are equivalent*


Before proving Theorem 3 we need an additional definition and two intermediate results.

**Definition 7 (Value-monotone paths).** *Let* A = (V,V<sup>P</sup> ,E,T) *be a target arena and consider a family of full-support probability distributions* μ = (μ<sup>u</sup> ∈ <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *. A path* <sup>v</sup><sup>0</sup> ...v<sup>k</sup> *is* <sup>μ</sup>-non-increasing *if and only if* Val<sup>μ</sup>(vi+1) <sup>≤</sup> Val<sup>μ</sup>(vi) *for all* <sup>0</sup> <sup>≤</sup> i<k*; it is* <sup>μ</sup>-non-decreasing *if and only if* Val<sup>μ</sup>(vi) <sup>≤</sup> Val<sup>μ</sup>(vi+1) *for all* <sup>0</sup> <sup>≤</sup> i<k*.*

It can be shown that from any path in a target arena ending in T one can obtain a simple non-decreasing one.

**Lemma 5.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *and a family of fullsupport probability distributions* <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *. If there is a path from some* v ∈ V *to* T*, there is also a simple* μ*-non-decreasing one.*

Additionally, we will make use of the following properties regarding vertexvalues. They formalize the relation between the value of a vertex, its owner, and the values of its successors.

**Lemma 6.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *and a family of fullsupport probability distributions* <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> *.*

*(i) For all* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>P</sup> *, for all successors* <sup>v</sup> <sup>∈</sup> uE *it holds that* Val<sup>μ</sup>(v) <sup>≤</sup> Val<sup>μ</sup>(u)*. (ii) For all* u ∈ V<sup>N</sup> *it holds that*

$$(\exists v \in uE : \text{Val}^{\mu}(u) < \text{Val}^{\mu}(v)) \implies (\exists w \in uE : \text{Val}^{\mu}(w) < \text{Val}^{\mu}(u)).$$

*Proof (of Theorem* 3*).* Recall that, by definition, (i) holds if and only if there exists a family <sup>μ</sup> = (μ<sup>u</sup> <sup>∈</sup> <sup>D</sup>(uE))<sup>u</sup>∈V<sup>N</sup> of full-support probability distributions such that <sup>∀</sup><sup>w</sup> <sup>∈</sup> <sup>W</sup> : Val<sup>μ</sup>(w) <sup>&</sup>lt; Val<sup>μ</sup>(v).

Let us prove (i) =⇒ (ii). Let x<sup>0</sup> < x<sup>1</sup> <... be the finitely many (i.e. at most <sup>|</sup><sup>V</sup> <sup>|</sup>) values that occur in the MDP <sup>A</sup>μ, and let <sup>k</sup> be such that Val<sup>μ</sup>(v) = <sup>x</sup>k. For all 0 <sup>≤</sup> i<k let <sup>S</sup><sup>i</sup> := {<sup>u</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> Val<sup>μ</sup>(u) = <sup>x</sup>i}, and let <sup>S</sup><sup>k</sup> := <sup>V</sup> \ ∪i<kSi. Let us show below that the S<sup>i</sup> form a drift partition.


We have that Valμ(w) <sup>&</sup>lt; Valμ(v) = <sup>x</sup><sup>k</sup> for all <sup>w</sup> <sup>∈</sup> <sup>W</sup>, by assumption, so W ⊆ S<sup>−</sup> <sup>k</sup> by construction. By Lemma 5 there exists a simple μ-non-decreasing path π from v to T, so all the vertices occurring in π have values at least Valμ(v), so π ⊆ Sk.

We will prove (ii) =⇒ (i) by defining some full-support distribution family μ. The definition will be partial only, first on π ∩ V<sup>N</sup> , and then on the drift vertices in V \ Sk. Let 0 <ε< 1, which is meant to be small enough. Let us write π = v<sup>0</sup> ...v<sup>n</sup> so that v<sup>0</sup> = v and v<sup>n</sup> ∈ T. Let us define μ on π ∩ V<sup>N</sup> as follows: for all i<n, if v<sup>i</sup> ∈ V<sup>N</sup> let μ<sup>v</sup><sup>i</sup> (vi+1) := 1 − ε. Let σ be an arbitrary Protagonist strategy such that for all i<n, if v<sup>i</sup> ∈ V<sup>P</sup> then σ(vi) := vi+1. Therefore

$$\begin{aligned} (1 - \varepsilon)^{|V|} &\le (1 - \varepsilon)^n &\text{since } \pi \text{ is simple} \\ &\le \prod\_{i < n, v\_i \in S\_N} \mu\_{v\_i}(v\_{i+1}) &\text{by definition of } \mu \\ &\le \mathbb{P}\_{\mathcal{A}\_\mu^\sigma}^v[\Diamond T] \\ &\le \max\_{\sigma'} \mathbb{P}\_{\mathcal{A}\_\mu^\sigma}^v[\Diamond T] = \text{Val}^\mu(v). \end{aligned}$$

So, for 0 <ε< <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>|</sup>V√<sup>|</sup> <sup>2</sup> , we have <sup>1</sup> <sup>2</sup> <sup>&</sup>lt; (1 <sup>−</sup> <sup>ε</sup>)|<sup>V</sup> <sup>|</sup> <sup>≤</sup> Val<sup>μ</sup>(v). Below we will further define <sup>μ</sup> such that Val<sup>μ</sup>(w) <sup>≤</sup> <sup>1</sup> <sup>−</sup> (1 <sup>−</sup> <sup>ε</sup>)|<sup>V</sup> <sup>|</sup> <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> for all w ∈ W and all <sup>0</sup> <ε< <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>|</sup>V√<sup>|</sup> <sup>2</sup> , which will prove (ii) =⇒ (i). However, the last part of the proof is more difficult.

For all 1 ≤ i ≤ k, for all drift vertices u ∈ Si, let (u) be a successor of u in S<sup>−</sup> i . Such a (u) exists by definition of the drift vertices. Then let μu((u)) := 1 − ε. We then claim that

$$\forall u \in D: (1 - \varepsilon)(1 - \mathbb{P}^{\varrho(u)}\_{\mathcal{A}\_{\mu}^{\sigma}}[\Diamond T]) \le 1 - \mathbb{P}^{u}\_{\mathcal{A}\_{\mu}^{\sigma}}[\Diamond T]. \tag{2}$$

Indeed, 1 <sup>−</sup> <sup>P</sup><sup>u</sup> A<sup>σ</sup> <sup>μ</sup> [♦T] is the probability that, starting at u and following σ, T is never reached; and (1 <sup>−</sup> <sup>ε</sup>)(1 <sup>−</sup> <sup>P</sup>(u) A<sup>σ</sup> <sup>μ</sup> [♦T]) is the probability that, starting at u and following σ, the second vertex is (u) and T is never reached.

Now let σ be an arbitrary strategy, and let us prove the following by induction on j.

$$\forall 0 \le j < k, \forall w \in S\_j \cup S\_j^- : \mathbb{P}\_{\mathcal{A}\_\mu^\sigma}^w[\Diamond T] \le 1 - (1 - \varepsilon)^j$$

Base case, j = 0: by assumption W is non-empty and included in S<sup>−</sup> <sup>k</sup> , so <sup>0</sup> < k. Also by assumption <sup>T</sup> <sup>⊆</sup> <sup>S</sup>k, so <sup>T</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup>. By definition of a drift partition, there are no edges going out of S0, regardless of whether the starting vertex is in V<sup>P</sup> or V<sup>N</sup> . So there is no path from w to T, which implies Val<sup>μ</sup>(w)=0 for all w ∈ S0, and the claim holds for the base case. Inductive case, let w ∈ S<sup>j</sup> , let D := D ∩ (S<sup>j</sup> ∪ S<sup>−</sup> <sup>j</sup> ) and let us argue that every path π from w to T must at some point leave S<sup>j</sup> ∪ S<sup>−</sup> <sup>j</sup> to reach a vertex with higher index, i.e. there is some edge (πi, πi+1) from π<sup>i</sup> ∈ S<sup>j</sup> ∪ S<sup>−</sup> <sup>j</sup> to some πi+1 ∈ S with j< . By definition of a drift partition, π<sup>i</sup> must also be a drift vertex, i.e. π<sup>i</sup> ∈ D . Thus, if we let F := V<sup>P</sup> \ D , Lemma 1 implies that P<sup>w</sup> A<sup>σ</sup> <sup>μ</sup> [♦T] = - u∈D- P<sup>w</sup> A<sup>σ</sup> <sup>μ</sup> [<sup>F</sup> <sup>U</sup> <sup>u</sup>] <sup>P</sup><sup>u</sup> A<sup>σ</sup> <sup>μ</sup> [♦T]. Now, since

$$\begin{split} & \sum\_{u \in D'} \mathbb{P}\_{\mathcal{A}\_{\mu}^{u}}^{u}[\Diamond T] \\ &= \sum\_{u \in D \cap S\_{j}^{-}} \mathbb{P}\_{\mathcal{A}\_{\mu}^{u}}^{u}[\Diamond T] + \sum\_{u \in D\_{\mu}} \mathbb{P}\_{\mathcal{A}\_{\mu}^{u}}^{u}[\Diamond T] \qquad \text{by splitting the sum} \\ & \leq \sum\_{u \in D \cap S\_{j}^{-}} \mathbb{P}\_{\mathcal{A}\_{\mu}^{u}}^{u}[\Diamond T] + \sum\_{u \in D\_{j}} (1 - (1 - \varepsilon)(1 - \mathbb{P}\_{\mathcal{A}\_{\mu}^{\mathcal{O}}}^{\varrho(u)}[\Diamond T])) \qquad \text{by (2)} \\ & \leq \sum\_{u \in D \cap S\_{j}^{-}} (1 - (1 - \varepsilon)^{j-1}) + \\ & \sum\_{u \in D \cap S\_{j}^{-}} (1 - (1 - \varepsilon)(1 - \varepsilon)^{j-1}) \qquad \forall x \in D\_{j} : \varrho(x) \in S\_{j}^{-} \\ & \leq \sum\_{u \in D'} (1 - (1 - \varepsilon)^{j}) \qquad \quad (1 - \varepsilon)^{j} \leq (1 - \varepsilon)^{j-1} \end{split}$$

and - u∈D- P<sup>w</sup> A<sup>σ</sup> <sup>μ</sup> [<sup>F</sup> <sup>U</sup> <sup>u</sup>] <sup>≤</sup> 1, we have that <sup>P</sup><sup>w</sup> A<sup>σ</sup> <sup>μ</sup> [♦T] <sup>≤</sup> <sup>1</sup>−(1−ε)<sup>j</sup> . The induction is thus complete. Since σ is arbitrary in the calculations above, and since j < <sup>k</sup> ≤ |<sup>V</sup> <sup>|</sup>, we find that Val<sup>μ</sup>(w) <sup>≤</sup> <sup>1</sup> <sup>−</sup> (1 <sup>−</sup> <sup>ε</sup>)|<sup>V</sup> <sup>|</sup> for all <sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>⊆</sup> <sup>S</sup><sup>−</sup> k .

For 0 <ε< <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>|</sup>V√<sup>|</sup> <sup>2</sup> we have <sup>1</sup> <sup>2</sup> <sup>&</sup>lt; (1 <sup>−</sup> <sup>ε</sup>)|<sup>V</sup> <sup>|</sup> , as mentioned after (1), so Val<sup>μ</sup>(w) <sup>≤</sup> <sup>1</sup> <sup>−</sup> (1 <sup>−</sup> <sup>ε</sup>)|<sup>V</sup> <sup>|</sup> <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> .

### **5 Intractability of the NWR**

It follows from Theorem 3 that we can decide whether a vertex is sometimes worse than a set of vertices by guessing a partition of the vertices and verifying that it is a drift partition. The verification can clearly be done in polynomial time.

**Corollary 1.** *Given a target arena* A = (V,V<sup>P</sup> ,E,T)*, a non-empty set* W ⊆ V *, and a vertex* v ∈ V *, determining whether* v W *is decidable and in* coNP*.*

We will now show that the problem is in fact coNP-complete already for Markov chains.

**Theorem 4.** *Given a target arena* A = (V,V<sup>P</sup> ,E,T)*, a non-empty vertex set* W ⊆ V *, and a vertex* v ∈ V *, determining whether* v W *is* coNP*-complete even if* |uE| = 1 *for all* u ∈ V<sup>P</sup> *.*

The idea is to reduce the 2-Disjoint Paths problem (2DP) to the existence of a drift partition witnessing that v {w} does not hold, for some v ∈ V . Recall that 2DP asks, given a directed graph G = (V,E) and vertex pairs (s1, t1),(s2, t2) ∈ V × V , whether there exists an s1–t<sup>1</sup> path π<sup>1</sup> and an s2–t<sup>2</sup> path <sup>π</sup><sup>2</sup> such that <sup>π</sup><sup>1</sup> and <sup>π</sup><sup>2</sup> are vertex disjoint, i.e. <sup>π</sup><sup>1</sup> <sup>∩</sup> <sup>π</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>. The problem is known to be NP-complete [10,12]. In the sequel, we assume without loss of generality that (a) t<sup>1</sup> and t<sup>2</sup> are reachable from all s ∈ V \ {t1, t2}; and (b) t<sup>1</sup> and t<sup>2</sup> are the only sinks G.

*Proof (of Theorem* 4*).* From the 2DP input instance, we construct the target arena A = (S, S<sup>P</sup> , R, T) with S := V ∪ E, R := {(u,u, v ),(u, v , v) ∈ S × S | (u, v) ∈ E or u = v ∈ {t1, t2}}, S<sup>P</sup> := V × V , and T := {t1, t1 }. We will show there are vertex-disjoint s1–t<sup>1</sup> and s2–t<sup>2</sup> paths in G if and only if there is a drift partition (Si)<sup>0</sup>≤i≤<sup>k</sup> and a simple s1–t<sup>1</sup> path π such that π ⊆ S<sup>k</sup> and s<sup>2</sup> ∈ S<sup>−</sup> k . The result will then follow from Theorem 3.

Suppose we have a drift partition (Si)<sup>0</sup>≤i≤<sup>k</sup> with s<sup>2</sup> ∈ S<sup>−</sup> <sup>k</sup> and a simple path π = v0v0, v<sup>1</sup> ...v<sup>n</sup>−1, v<sup>n</sup> v<sup>n</sup> with v<sup>0</sup> = s1, v<sup>n</sup> = t1. Since the set {t2,t2, t2 } is *trapping* in A, i.e. all paths from vertices in the set visit only vertices from it, we can assume that S<sup>0</sup> = {t2,t2, t2 }. (Indeed, for any drift partition, one can obtain a new drift partition by moving any trapping set to a new lowest layer.) Now, using the assumption that t<sup>2</sup> is reachable from all s ∈ V \ {t1, t2} one can show by induction that for all 0 ≤ j<k and for all = u<sup>0</sup> ∈ S<sup>j</sup> there is a path u<sup>0</sup> ...u<sup>m</sup> in G with u<sup>m</sup> = t<sup>2</sup> and ⊆ S<sup>−</sup> <sup>j</sup>+1. This implies that there is a s2–t<sup>2</sup> path π<sup>2</sup> in G such that π<sup>2</sup> ⊆ S<sup>−</sup> <sup>k</sup> . It follows that π<sup>2</sup> is vertex disjoint with the s1–t<sup>1</sup> path v<sup>0</sup> ...v<sup>n</sup> in G.

Now, let us suppose that we have s1–t<sup>1</sup> and s2–t<sup>2</sup> vertex disjoint paths π<sup>1</sup> = u<sup>0</sup> ...u<sup>n</sup> and π<sup>2</sup> = v<sup>0</sup> ...vm. Clearly, we can assume both π1, π<sup>2</sup> are simple. We will construct a partition (Si)<sup>0</sup>≤i≤m+1 and show that it is indeed a drift partition, that u0u0, u<sup>1</sup> ...u<sup>n</sup>−1, u<sup>n</sup> u<sup>n</sup> ⊆ Sm+1, and s<sup>2</sup> = v<sup>0</sup> ∈ S<sup>−</sup> <sup>m</sup>+1. Let us set S<sup>0</sup> := {v<sup>m</sup>−1, v<sup>m</sup> , vm,t2, t2 }, S<sup>i</sup> := {v<sup>m</sup>−i−1, v<sup>m</sup>−<sup>i</sup> , v<sup>m</sup>−<sup>i</sup>} for all 0 < i ≤ m, and Sm+1 := S \ ∪<sup>0</sup>≤i≤<sup>m</sup>Si. Since π<sup>2</sup> is simple, (Si)<sup>0</sup>≤i≤m+1 is a partition of V . Furthermore, we have that s<sup>2</sup> = v<sup>0</sup> ∈ S<sup>−</sup> <sup>m</sup>+1, and u0u0, u<sup>1</sup> ...u<sup>n</sup>−1, u<sup>n</sup> u<sup>n</sup> ⊆ Sm+1 since π<sup>1</sup> and π<sup>2</sup> are vertex disjoint. Thus, it only remains for us to argue that for all 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup> + 1: for all <sup>w</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> <sup>∩</sup>S<sup>N</sup> we have wR <sup>∩</sup>S<sup>+</sup> <sup>i</sup> <sup>=</sup> <sup>∅</sup>, and for all <sup>w</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> <sup>∩</sup> <sup>V</sup><sup>N</sup> we have wR <sup>∩</sup> <sup>S</sup><sup>+</sup> i <sup>=</sup> <sup>∅</sup> <sup>=</sup><sup>⇒</sup> wR <sup>∩</sup> <sup>S</sup><sup>−</sup> <sup>i</sup> = ∅. By construction of the Si, we have that eR ⊆ S<sup>i</sup> for all 0 ≤ i ≤ m and all e ∈ S<sup>i</sup> ∩ S<sup>P</sup> . Furthermore, for all 0 < i ≤ m, for all x ∈ S<sup>i</sup> ∩ S<sup>N</sup> = {v<sup>m</sup>−<sup>i</sup>}, there exists y ∈ S<sup>i</sup>−<sup>1</sup> ∩ S<sup>P</sup> = {v<sup>m</sup>−<sup>i</sup>, v<sup>m</sup>−i+1 } such that (x, y) ∈ R—induced by (v<sup>m</sup>−<sup>i</sup>, v<sup>m</sup>−1+1) ∈ E from π2. To conclude, we observe that since S<sup>0</sup> = {v<sup>m</sup>−<sup>1</sup>, v<sup>m</sup> , v<sup>m</sup> = t2,t2, t2 } and {t2,t2, t2 } is trapping in A, the set t2R is contained in S0.

#### **6 Efficiently Under-Approximating the NWR**

Although the full NWR cannot be efficiently computed for a given MDP, we can hope for "under-approximations" that are accurate and efficiently computable.

**Definition 8 (Under-approximation of the NWR).** *Let* A = (V,V<sup>P</sup> ,E,T) *be a target arena and consider a relation* : V × P(V )*. The relation is an* under-approximation *of the NWR if and only if* ⊆ *.*

We denote by <sup>∗</sup> the *pseudo transitive closure* of . That is, <sup>∗</sup> is the smallest relation such that ⊆<sup>∗</sup> and for all u ∈ V,X ⊆ V if there exists W ⊆ V such that u <sup>∗</sup> W and w <sup>∗</sup> X for all w ∈ W, then u <sup>∗</sup> X.

*Remark 1.* The empty set is an under-approximation of the NWR. For all underapproximations of the NWR, the pseudo transitive closure <sup>∗</sup> of is also an under-approximation of the NWR.

In [2], efficiently-decidable sufficient conditions for the NWR were given. In particular, those conditions suffice to infer relations such as those in the right MDP from Fig. 3. We recall (Proposition 1) and extend (Proposition 2) these conditions below.

**Proposition 1 (From** [2]**).** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *and an under-approximation of the NWR. For all vertices* v<sup>0</sup> ∈ V *, and sets* W ⊆ V *the following hold.*


*Proof (Sketch).* The main idea of the proof of item (i) is to note that S is visited before T. The desired result then follows from Lemma 1. For item (ii), we intuitively have that there is a strategy to visit T with some probability or visit W, where the chances of visiting T are worse than before. We then show that it is never worse to start from v<sup>0</sup> to have better odds of visiting T.

The above "rules" give an iterative algorithm to obtain increasingly better under-approximations of the NWR: from <sup>i</sup> apply the rules and obtain a new under-approximation i+1 by adding the new pairs and taking the pseudo transitive closure; then repeat until convergence. Using the special cases from Sect. 3.2 we can obtain a nontrivial initial under-approximation <sup>0</sup> of the NWR in polynomial time.

The main problem is how to avoid testing all subsets W ⊆ V in every iteration. One natural way to ensure we do not consider all subsets of vertices in every iteration is to apply the rules from Proposition 1 only on the successors of Protagonist vertices.

In the same spirit of the iterative algorithm described above, we now give two new rules to infer NWR pairs.

**Proposition 2.** *Consider a target arena* A = (V,V<sup>P</sup> ,E,T) *and an underapproximation of the NWR.*


*Proof (Sketch).* Item (i) follows immediately from the definition of Val. For item (ii) one can use the Bellman optimality equations for infinite-horizon reachability in MDPs to show that since the successors of v are never worse than the non-dominated successors of u, we must have u {v}.

**Fig. 4.** Two target arenas with *T* = {*fin*} are shown. Using Propositions 1 and 2 one can conclude that *p* ∼ *q* in both target arenas.

The rules stated in Proposition 2 can be used to infer relations like those depicted in Fig. 4 and are clearly seen to be computable in polynomial time as they speak only of successors of vertices.

#### **7 Conclusions**

We have shown that the never-worse relation is, unfortunately, not computable in polynomial time. On the bright side, we have extended the iterative polynomialtime algorithm from [2] to under-approximate the relation. In that paper, a prototype implementation of the algorithm was used to empirically show that interesting MDPs (from the set of benchmarks included in PRISM [17]) can be drastically reduced.

As future work, we believe it would be interesting to implement an exact algorithm to compute the NWR using SMT solvers. Symbolic implementations of the iterative algorithms should also be tested in practice. In a more theoretical direction, we observe that the planning community has also studied maximizing the probability of reaching a target set of states under the name of MAXPROB (see, e.g., [16,21]). There, online approximations of the NWR would make more sense than the under-approximation we have proposed here. Finally, one could define a notion of never-worse for finite-horizon or quantitative objectives.

**Acknowledgements.** The research leading to these results was supported by the ERC Starting grant 279499: inVEST. Guillermo A. P´erez is an F.R.S.-FNRS Aspirant and FWA postdoc fellow.

We thank Nathana¨el Fijalkow for pointing out the relation between this work and the study of interval MDPs and numberless MDPs. We also thank Shaull Almagor, Micha¨el Cadilhac, Filip Mazowiecki, and Jean-Fran¸cois Raskin for useful comments on earlier drafts of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A Hierarchy of Scheduler Classes for Stochastic Automata**

Pedro R. D'Argenio1,2,3, Marcus Gerhold<sup>4</sup> , Arnd Hartmanns4(B) , and Sean Sedwards<sup>5</sup>

> <sup>1</sup> Universidad Nacional de C´ordoba, C´ordoba, Argentina dargenio@famaf.unc.edu.ar <sup>2</sup> CONICET, C´ordoba, Argentina <sup>3</sup> Saarland University, Saarbr¨ucken, Germany <sup>4</sup> University of Twente, Enschede, The Netherlands {m.gerhold,a.hartmanns}@utwente.nl <sup>5</sup> University of Waterloo, Waterloo, Canada sean.sedwards@uwaterloo.ca

**Abstract.** Stochastic automata are a formal compositional model for concurrent stochastic timed systems, with general distributions and nondeterministic choices. Measures of interest are defined over *schedulers* that resolve the nondeterminism. In this paper we investigate the power of various theoretically and practically motivated classes of schedulers, considering the classic complete-information view and a restriction to non-prophetic schedulers. We prove a hierarchy of scheduler classes w.r.t. unbounded probabilistic reachability. We find that, unlike Markovian formalisms, stochastic automata distinguish most classes even in this basic setting. Verification and strategy synthesis methods thus face a tradeoff between powerful and efficient classes. Using lightweight scheduler sampling, we explore this tradeoff and demonstrate the concept of a useful approximative verification technique for stochastic automata.

### **1 Introduction**

The need to analyse continuous-time stochastic models arises in many practical contexts, including critical infrastructures [4], railway engineering [36], space mission planning [7], and security [28]. This has led to a number of discrete event simulation tools, such as those for networking [34,35,42], whose probabilistic semantics is founded on generalised semi-Markov processes (GSMP [21,33]). Nondeterminism arises through inherent concurrency of independent processes [11], but may also be deliberate underspecification. Modelling such uncertainty with probability is convenient for simulation, but not always adequate [3,29]. Various models and formalisms have thus been proposed to extend continuous-time

This work is supported by the 3TU.BSR, NWO BEAT (602.001.303) and JST ERATO HASUO Metamathematics for Systems Design (JPMJER1603) projects, by ERC grant 695614 (POWVER), and by SeCyT-UNC projects 05/BP12, 05/B497.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 384–402, 2018. https://doi.org/10.1007/978-3-319-89366-2\_21

stochastic processes with nondeterminism [8,10,19,23,27,38]. It is then possible to *verify* such systems by considering the extremal probabilities of a property. These are the supremum and infimum of the probabilities of the property in the purely stochastic systems induced by classes of *schedulers* (also called *strategies*, *policies* or *adversaries*) that resolve all nondeterminism. If the nondeterminism is considered controllable, one may alternatively be interested in the *planning* problem of synthesising a scheduler that satisfies certain probability bounds.

We consider closed systems of stochastic automata (SA [16]), which extend GSMP and feature both generally distributed stochastic delays as well as discrete nondeterministic choices. The latter may arise from non-continuous distributions (e.g. deterministic delays), urgent edges, and edges waiting on multiple clocks. Numerical verification algorithms exist for very limited subclasses of SA only: Buchholz et al. [13] restrict to phase-type or matrix-exponential distributions, such that nondeterminism cannot arise (as each edge is guarded by a single clock). Bryans et al. [12] propose two algorithms that require an a priori fixed scheduler, continuous bounded distributions, and that all active clocks be reset when a location is entered. The latter forces regeneration on every edge, making it impossible to use clocks as memory between locations. Regeneration is central to the work of Ballarini et al. [6], however they again exclude nondeterminism. The only approach that handles nondeterminism is the region-based approximation scheme of Kwiatkowska et al. [30] for a model closely related to SA, but restricted to bounded continuous distributions. Without that restriction [22], error bounds and convergence guarantees are lost.

Evidently, the combination of nondeterminism and continuous probability distributions is a particularly challenging one. With this paper, we take on the underlying problem from a fundamental perspective: we investigate the power of, and relationships between, different classes of schedulers for SA. Our motivation is, on the one hand, that a clear understanding of scheduler classes is crucial to design verification algorithms. For example, Markov decision process (MDP) model checking works well because memoryless schedulers suffice for reachability, and the efficient time-bounded analysis of continuous-time MDP (CTMDP) exploits a relationship between two scheduler classes that are sufficiently simple, but on their own do not realise the desired extremal probabilities [14]. When it comes to planning problems, on the other hand, practitioners desire *simple* solutions, i.e. schedulers that need little information and limited memory, so as to be explainable and suitable for implementation on e.g. resource-constrained embedded systems. Understanding the capabilities of scheduler classes helps decide on the tradeoff between simplicity and the ability to attain optimal results.

We use two perspectives on schedulers from the literature: the classic complete-information *residual lifetimes* semantics [9], where optimality is defined via history-dependent schedulers that see the entire current state, and *nonprophetic* schedulers [25] that cannot observe the timing of *future* events. Within each perspective, we define classes of schedulers whose views of the state and history are variously restricted (Sect. 3). We prove their relative ordering w.r.t. achieving optimal reachability probabilities (Sect. 4). We find that SA distinguish most classes. In particular, memoryless schedulers suffice in the completeinformation setting (as is implicit in the method of Kwiatkowska et al. [30]), but turn out to be suboptimal in the more realistic non-prophetic case. Considering only the relative order of clock expiration times, as suggested by the first algorithm of Bryans et al. [12], surprisingly leads to partly suboptimal, partly incomparable classes. Our distinguishing SA are small and employ a common nondeterministic gadget. They precisely pinpoint the crucial differences and how schedulers interact with the various features of SA, providing deep insights into the formalism itself.

Our study furthermore forms the basis for the application of *lightweight scheduler sampling* (LSS) to SA. LSS is a technique to use Monte Carlo simulation/statistical model checking with nondeterministic models. On every LSS simulation step, a pseudo-random number generator (PRNG) is re-seeded with a hash of the identifier of the current scheduler and the (restricted) information about the current state (and previous states, for history-dependent schedulers) that the scheduler's class may observe. The PRNG's first iterate then determines the scheduler's action deterministically. LSS has been successfully applied to MDP [18,31,32] and probabilistic timed automata [15,26]. Using only constant memory, LSS samples schedulers uniformly from a selected scheduler class to find "near-optimal" schedulers that conservatively approximate the true extremal probabilities. Its principal advantage is that it is largely indifferent to the size of the state space and of the scheduler space; in general, sampling efficiency depends only on the likelihood of selecting near-optimal schedulers. However, the mass of *near*-optimal schedulers in a scheduler class that also includes the optimal scheduler may be *less* than the mass in a class that does *not* include it. Given that the mass of optimal schedulers may be vanishingly small, it may be advantageous to sample from a class of less powerful schedulers. We explore these tradeoffs and demonstrate the concept of LSS for SA in Sect. 5.

**Other Related Work.** Alur et al. first mention nondeterministic stochastic systems similar to SA in [2]. Markov automata (MA [19]), interactive Markov chains (IMC [27]) and CTMDP are special cases of SA restricted to exponential distributions. Song et al. [37] look into partial information distributed schedulers for MA, combining earlier works of de Alfaro [1] and Giro and D'Argenio [20] for MDP. Their focus is on information flow and hiding in parallel specifications. Wolf et al. [39] investigate the power of classic (time-abstract, deterministic and memoryless) scheduler classes for IMC. They establish (non-strict) subset relationships for almost all classes w.r.t. trace distribution equivalence, a very strong measure. Wolovick and Johr [41] show that the class of measurable schedulers for CTMDP is complete and sufficient for reachability problems.

### **2 Preliminaries**

For a given set <sup>S</sup>, its power set is <sup>P</sup>(S). We denote by <sup>R</sup>, <sup>R</sup><sup>+</sup>, and <sup>R</sup><sup>+</sup> <sup>0</sup> the sets of real numbers, positive real numbers and non-negative real numbers, respectively. A (discrete) *probability distribution* over a set Ω is a function μ: Ω → [0, 1], such that support(μ) def = { ω ∈ Ω | μ(ω) > 0 } is countable and - <sup>ω</sup>∈support(μ) <sup>μ</sup>(ω) = 1. Dist(Ω) is the set of probability distributions over Ω. We write D(ω) for the *Dirac* distribution for ω, defined by D(ω)(ω) = 1. Ω is *measurable* if it is endowed with a σ-algebra σ(Ω): a collection of *measurable* subsets of Ω. A (continuous) *probability measure* over Ω is a function μ: σ(Ω) → [0, 1], such that μ(Ω)=1 and μ(∪i∈<sup>I</sup> Bi) = - <sup>i</sup>∈<sup>I</sup> <sup>μ</sup>(Bi) for any countable index set <sup>I</sup> and pairwise disjoint measurable sets B<sup>i</sup> ⊆ Ω. Prob(Ω) is the set of probability measures over Ω. Each μ ∈ Dist(Ω) induces a probability measure. Given probability measures μ<sup>1</sup> and μ2, we denote by μ<sup>1</sup> ⊗ μ<sup>2</sup> the *product measure*: the unique probability measure such that (μ<sup>1</sup> ⊗ μ2)(B<sup>1</sup> × B2) = μ1(B1) · μ2(B2), for all measurable B<sup>1</sup> and B2. For a collection of measures (μi)<sup>i</sup>∈<sup>I</sup> , we analogously denote the product measure by <sup>i</sup>∈<sup>I</sup> <sup>μ</sup>i. Let *Val* def <sup>=</sup> <sup>V</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> be the set of valuations for an (implicit) set <sup>V</sup> of (non-negative real-valued) variables. **<sup>0</sup>** <sup>∈</sup> *Val* assigns value zero to all variables. Given X ⊆ V and v ∈ *Val*, we write v[X] for the valuation defined by <sup>v</sup>[X](x) = 0 if <sup>x</sup> <sup>∈</sup> <sup>X</sup> and <sup>v</sup>[X](y) = <sup>v</sup>(y) otherwise. For <sup>t</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> , v + t is the valuation defined by (v + t)(x) = v(x) + t for all x ∈ V .

**Stochastic Automata** [16] extend labelled transition systems with stochastic *clocks*: real-valued variables that increase synchronously with rate 1 over time and expire some random amount of time after having been *restarted*. Formally:

**Definition 1.** *<sup>A</sup>* stochastic automaton *(SA) is a tuple Loc*, <sup>C</sup>, A, E, F, *init, where Loc is a countable set of* locations*,* C *is a finite set of* clocks*,* A *is the finite* action alphabet*, and* E : *Loc* → P(P(C) × A × P(C) × Dist(*Loc*)) *is the* edge function*, which maps each location to a finite set of edges that in turn consist of a* guard set *of clocks, a label, a* restart set *of clocks and a distribution over target locations.* <sup>F</sup> : C → Prob(R<sup>+</sup> <sup>0</sup> ) *is the* delay measure function *that maps each clock to a probability measure, and init* ∈ *Loc is the* initial location*.*

We also write G,a,R −−−−→<sup>E</sup> μ for G, a, R, μ ∈ E(). W.l.o.g. we restrict to SA where edges are fully characterised by source state and action label, i.e. whenever <sup>G</sup>1,a,R<sup>1</sup> −−−−−→<sup>E</sup> <sup>μ</sup><sup>1</sup> and <sup>G</sup>2,a,R<sup>2</sup> −−−−−→<sup>E</sup> μ2, then G<sup>1</sup> = G2, R<sup>1</sup> = R<sup>2</sup> and μ<sup>1</sup> = μ2.

Intuitively, an SA starts in *init* with all clocks expired. An edge G,a,R −−−−→<sup>E</sup> μ may be taken only if all clocks in G are expired. If any edge is enabled, some edge must be taken (i.e. all actions are *urgent* and thus the SA is *closed*). When an edge is taken, its action is a, all clocks in R are restarted, other expired clocks remain expired, and we move to successor location with probability μ( ). There, another edge may be taken immediately or we may need to wait until some further clocks expire, and so on. When a clock c is restarted, the time until it expires is chosen randomly according to the probability measure F(c).

*Example 1.* We show an example SA, M0, in Fig. 1. Its initial location is 0. It has two clocks, x and y, with F(x) and F(y) both being the continuous uniform distribution over the interval [0, 1]. No time can pass in locations <sup>0</sup> and 1, since they have outgoing edges with empty guard sets. We omit action labels and assume every edge to have a unique label. On entering 1, both clocks are restarted. The choice of going to either <sup>2</sup> or <sup>3</sup> from <sup>1</sup> is nondeterministic, since

**Fig. 1.** Example SA M<sup>0</sup> **Fig. 2.** Excerpt of the TPTS semantics of M<sup>0</sup>

the two edges are always enabled at the same time. In 2, we have to wait until the first of the two clocks expires. If that is x, we have to move to location ✓; if it is y, we have to move to ✗. The probability that both expire at the same time is zero. Location <sup>3</sup> behaves analogously, but with the target states interchanged.

**Timed Probabilistic Transition Systems** form the semantics of SA. They are finitely-nondeterministic uncountable-state transition systems:

**Definition 2.** *A (finitely nondeterministic)* timed probabilistic transition system *(TPTS) is a tuple* S, A ,T,s*init.* S *is a measurable set of states.* A = <sup>R</sup><sup>+</sup> <sup>A</sup> *is the* alphabet*, partitioned into* delays *in* <sup>R</sup><sup>+</sup> *and* jumps *in* <sup>A</sup>*.* T : S → P(A × Prob(S)) *is the* transition function*, which maps each state to a finite set of transitions, each consisting of a label in* A *and a measure over target states. The initial state is* s*init* ∈ S*. For all* s ∈ S*, we require* |T(s)| = 1 *if* ∃ t, μ ∈ <sup>T</sup>(s): <sup>t</sup> <sup>∈</sup> <sup>R</sup>+*, i.e. states admitting delays are deterministic.*

We also write s <sup>a</sup> −→<sup>T</sup> μ for a, μ ∈ T(s). A *run* is an infinite alternating sequence s0a0s1a1... ∈ (S×A )<sup>ω</sup>, with s<sup>0</sup> = s*init*. A *history* is a finite prefix of a run ending in a state, i.e. an element of (S × A )<sup>∗</sup> × S. Runs resolve all nondeterministic and probabilistic choices. A *scheduler* resolves only the nondeterminism:

**Definition 3.** *A measurable function* <sup>s</sup> : (<sup>S</sup> <sup>×</sup>A )<sup>∗</sup> ×S → Dist(A × Prob(S)) *is a* scheduler *if, for all histories* h ∈ (S × A )<sup>∗</sup> × S*,* a, μ ∈ support(s(h)) *implies lst*<sup>h</sup> a −→<sup>T</sup> μ*, where lst*<sup>h</sup> *is the last state of* h*.*

Once a scheduler has chosen s<sup>i</sup> a −→<sup>T</sup> μ, the successor state si+1 is picked randomly according to μ. Every scheduler s defines a probability measure P<sup>s</sup> on the space of all runs. For a formal definition, see [40]. As is usual, we restrict to *non-Zeno* schedulers that make time diverge with probability one: we require <sup>P</sup>s(Π∞) = 1, where Π<sup>∞</sup> is the set of runs where the sum of delays is ∞. In the remainder of this paper we consider extremal probabilities of reaching a set of goal locations G:

**Definition 4.** *For* <sup>G</sup> <sup>⊆</sup> *Loc, let* <sup>J</sup><sup>G</sup> def = { , v, e ∈ S | ∈ G }*. Let* S *be a class of schedulers. Then* P<sup>S</sup> min(G) *and* P<sup>S</sup> max(G) *are the minimum and maximum* reachability probabilities *for* G *under* S*, defined as* P<sup>S</sup> min(G) = inf<sup>s</sup>∈<sup>S</sup> <sup>P</sup>s(Π<sup>J</sup>*<sup>G</sup>* ) *and* P<sup>S</sup> max(G) = sup<sup>s</sup>∈<sup>S</sup> <sup>P</sup>s(Π<sup>J</sup>*<sup>G</sup>* )*, respectively.*

**Semantics of Stochastic Automata.** We present here the residual lifetimes semantics of [9], simplified for closed SA: any delay step must be of the minimum delay that makes some edge become enabled.

**Definition 5.** *The semantics of an SA* <sup>M</sup> <sup>=</sup> *Loc*, <sup>C</sup>, A, E, F, *init is the TPTS*

$$\mathbb{E}\left[M\right] = \langle Loc \times Val \times Val, A \uplus \mathbb{R}^+, T\_M, \langle \ell\_{init}, \mathbf{0}, \mathbf{0} \rangle \rangle$$

*where the states are triples* , v, e *of the current location , a valuation* v *assigning to each clock its current value, and a valuation* e *keeping track of all clocks' expiration times.* T<sup>M</sup> *is the smallest transition function satisfying inference rules*

$$\frac{\ell \xrightarrow{G, a, R} \Sigma \text{ } \mu \quad \text{En}(G, v, e)}{\langle \ell, v, e \rangle \xrightarrow{a}\_{T\_M} \mu \otimes \mathcal{D}(v[R]) \otimes \text{Sample}\_e^R}$$

$$\frac{\ell \in \mathbb{R}^+ \quad \exists \ell \xrightleftharpoons^{G, a, R} \mu \colon \text{En}(G, v + t, e) \quad \forall \, t' \in [0, t), \ell \xrightarrow{G, a, R} \nu \colon \text{En}(G, v + t', e)}{\langle \ell, v, e \rangle \xrightarrow{t \to\_M} \mathcal{D}(\langle \ell, v + t, e \rangle)}$$

*with* En(G, v, e) def = ∀ x ∈ G: v(x) ≥ e(x) *characterising the enabled edges and*

$$\text{Sample}\_e^R \overset{\text{def}}{=} \bigotimes\_{c \in \mathcal{C}} \begin{cases} F(c) & if \, c \in R \\ \mathcal{D}(e(c)) & if \, c \notin R. \end{cases}$$

The second rule creates *delay* steps of t time units if no edge is enabled from now until just before t time units have elapsed (third premise) but then, after exactly t time units, some edge becomes enabled (second premise). The first rule applies if an edge G,a,R −−−−→<sup>E</sup> μ is enabled: a transition is taken with the edge's label, the successor state's location is chosen by μ, v is updated by resetting the clocks in R to zero, and the expiration times for the restarted clocks are resampled. All other expiration times remain unchanged. Notice that [[M]] is also a nondeterministic labelled Markov process [40] (a proof can be found in [17]).

*Example 2.* Figure 2 outlines the semantics of M0. The first step from <sup>0</sup> to all the states in <sup>1</sup> is a single transition. Its probability measure is the product of F(x) and F(y), sampling the expiration times of the two clocks. We exemplify the behaviour of all of these states by showing it for the case of expiration times e(x) and e(y), with e(x) < e(y). In this case, to maximise the probability of reaching ✓, we should take the transition to the state in 2. If a scheduler s can see the expiration times, noting that only their order matters here, it can always make the optimal choice and achieve P{s} max({ ✓ }) = 1.

### **3 Classes of Schedulers**

We now define classes of schedulers for SA with restricted information, hiding in various combinations the history and parts of states such as clock values and expiration times. All definitions consider TPTS as in Definition 5 with states , v, e and we require for all s that a, μ ∈ support(s(h)) ⇒ *lst*<sup>h</sup> a −→<sup>T</sup> μ, as in Definition 3.

#### **3.1 Classic Schedulers**

We first consider the "classic" complete-information setting where schedulers can in particular see expiration times. We start with restricted classes of historydependent schedulers. Our first restriction hides the values of all clocks, only revealing the total time since the start of the history. This is inspired by the stepcounting or time-tracking schedulers needed to obtain optimal step-bounded or time-bounded reachability probabilities on MDP or Markov automata:

**Definition 6.** *A classic history-dependent* global-time *scheduler is a measurable function* s : (S|,t,e × A )<sup>∗</sup> × S|,t,e → Dist(A × Prob(S))*, where* S|,t,e def = *Loc* × R+ <sup>0</sup> ×*Val with the second component being the total time* t *elapsed since the start of the history. We write* S*hist* ,t,e *for the set of all such schedulers.*

We next hide the values of all clocks, revealing only their expiration times:

**Definition 7.** *A classic history-dependent* location-based *scheduler is a measurable function* s : (S|,e × A )<sup>∗</sup> × S|,e → Dist(A × Prob(S))*, where* S|,e def = *Loc* × *Val , with the second component being the clock expiration times* e*. We write* S*hist* ,e *for the set of all such schedulers.*

Having defined three classes of classic history-dependent schedulers, S*hist* ,v,e, S*hist* ,t,e and S*hist* ,e , noting that S*hist* ,v,e denotes all schedulers of Definition 3, we also consider them with the restriction that they only see the relative order of clock expiration, instead of the exact expiration times: for each pair of clocks c1, c2, these schedulers see the relation ∼∈{<, =, >} in e(c1) − v(c1) ∼ e(c2) − v(c2). E.g. in <sup>1</sup> of Example 2, the scheduler would not see e(x) and e(y), but only whether e(x) < e(y) or vice-versa (since v(x) = v(y) = 0, and equality has probability 0 here). We consider this case because the expiration order is sufficient for the first algorithm of Bryans et al. [12], and would allow optimal decisions in M<sup>0</sup> of Fig. 1. We denote the relative order information by o, and the corresponding scheduler classes by S*hist* ,v,o, S*hist* ,t,o and S*hist* ,o . We now define memoryless schedulers, which only see the current state and are at the core of e.g. MDP model checking. On most formalisms, they suffice to obtain optimal reachability probabilities.

**Definition 8.** *A classic* memoryless *scheduler is a measurable function* <sup>s</sup> : <sup>S</sup> <sup>→</sup> Dist(A <sup>×</sup> Prob(S))*. We write* <sup>S</sup>*ml* ,v,e *for the set of all such schedulers.*

We apply the same restrictions as for history-dependent schedulers:

**Definition 9.** *A classic memoryless global-time scheduler is a measurable function* s : S|,t,e → Dist(A × Prob(S))*, with* S|,t,e *as in Definition 6. We write* S*ml* ,t,e *for the set of all such schedulers.*

**Definition 10.** *A classic memoryless location-based scheduler is a measurable function* s : S|,e → Dist(A × Prob(S))*, with* S|,e *as in Definition 7. We write* S*ml* ,e *for the set of all such schedulers.*

Again, we also consider memoryless schedulers that only see the expiration order, so we have memoryless scheduler classes S*ml* ,v,e, S*ml* ,t,e, S*ml* ,e, S*ml* ,v,o, S*ml* ,t,o and S*ml* ,o. Class S*ml* ,o is particularly attractive because it has a compact finite domain.

#### **3.2 Non-prophetic Schedulers**

Consider the SA M<sup>0</sup> in Fig. 1. No matter which of the previously defined scheduler classes we choose, we always find a scheduler that achieves probability 1 to reach ✓, and a scheduler that achieves probability 0. This is because they can all see the expiration times or expiration order of x and y when in 1. When in 1, x and y have not yet expired—this will only happen later, in <sup>2</sup> or 3—yet the schedulers already know which clock will "win". The classic schedulers can thus be seen to make decisions based on the timing of *future* events. This *prophetic* scheduling has already been observed in [9], where a "fix" in the form of the *spent lifetimes* semantics was proposed. Hartmanns et al. [25] have shown that this not only still permits prophetic scheduling, but even admits *divine* scheduling, where a scheduler can *change* the future. The authors propose a complex *non-prophetic* semantics that provably removes all prophetic and divine behaviour.

Much of the complication of the non-prophetic semantics of [25] is due to it being specified for open SA that include delayable actions. For the closed SA setting of this paper, prophetic scheduling can be more easily excluded by hiding from the schedulers all information about what will happen in the future of the system's evolution. This information is only contained in the expiration times e or the expiration order o. We can thus keep the semantics of Sect. 2 and modify the definition of schedulers to exclude prophetic behaviour by construction.

In what follows, we thus also consider all scheduler classes of Sect. 3.1 with the added constraint that the expiration times, resp. the expiration order, are not visible, resulting in the *non-prophetic* classes S*hist* ,v , S*hist* ,t , S*hist* , S*ml* ,v, S*ml* ,t and S*ml* . Any non-prophetic scheduler can only reach ✓ of M<sup>0</sup> with probability <sup>1</sup> 2 .

### **4 The Power of Schedulers**

Now that we have defined a number of classes of schedulers, we need to determine what the effect of the restrictions is on our ability to optimally control an SA. We thus evaluate the power of scheduler classes w.r.t. unbounded reachability probabilities (Definition 4) on the semantics of SA. We will see that this simple setting already suffices to reveal interesting differences between scheduler classes.

For two scheduler classes S<sup>1</sup> and S2, we write S<sup>1</sup> - S<sup>2</sup> if, for all SA and all sets of goal locations G, P<sup>S</sup><sup>1</sup> min(G) <sup>≤</sup> <sup>P</sup><sup>S</sup><sup>2</sup> min(G) and P<sup>S</sup><sup>1</sup> max(G) <sup>≥</sup> <sup>P</sup><sup>S</sup><sup>2</sup> max(G). We write S<sup>1</sup> S<sup>2</sup> if additionally there exists at least one SA and set G where P<sup>S</sup><sup>1</sup> min(G ) < P<sup>S</sup><sup>2</sup> min(G ) or P<sup>S</sup><sup>1</sup> max(G ) > P<sup>S</sup><sup>2</sup> max(G ). Finally, we write S<sup>1</sup> ≈ S<sup>2</sup> for S<sup>1</sup> - S<sup>2</sup> ∧ S<sup>2</sup> - S1, and S<sup>1</sup> ≈ S2, i.e. the classes are incomparable, for S<sup>1</sup> - S<sup>2</sup> ∧ S<sup>2</sup> - S1. Unless noted otherwise, we omit proofs for S<sup>1</sup> - S<sup>2</sup> when it is obvious that the information available to S<sup>1</sup> includes the information available to S2. All our distinguishing examples are based on the resolution of a single nondeterministic choice between two actions to eventually reach one of two locations. We therefore prove only w.r.t. the maximum probability, pmax, for these examples since the minimum probability is given by 1 − pmax and an analogous proof for pmin can be made by relabelling locations. We may write Pmax(S<sup>y</sup> x) for P<sup>S</sup>*<sup>y</sup> <sup>x</sup>* max({ ✓ }) to improve readability.

$$\begin{array}{ccccccccc}\mathfrak{S}^{ml}\_{\ell,o} & \prec & \mathfrak{S}^{ml}\_{\ell,t,o} & \prec & \mathfrak{S}^{hist}\_{\ell,o} & \prec & \mathfrak{S}^{hist}\_{\ell,o} \\ \mathfrak{R} & & & \nwarrow & \lambda & & \ll & \varnothing \\ \mathfrak{S}^{ml}\_{\ell,e} & \prec & \mathfrak{S}^{ml}\_{\ell,t,e} & \prec & \mathfrak{S}^{ml}\_{\ell,v,e} & \succ & \mathfrak{S}^{hist}\_{\ell,t,o} \\ \lambda & & \lambda & & \ll & \varnothing & & \ll \\ \mathfrak{S}^{hist}\_{\ell,e} & \simeq & \mathfrak{S}^{hist}\_{\ell,t,e} & \simeq & \mathfrak{S}^{hist}\_{\ell,v,e} & \succ & \mathfrak{S}^{hist}\_{\ell,v,o} \end{array}$$

$$\begin{array}{ccccc} \mathfrak{S}\_{\ell}^{ml} & \prec & \mathfrak{S}\_{\ell,t}^{ml} & \prec & \mathfrak{S}\_{\ell,v}^{ml} \\ \lambda & & \lambda & & \lambda \\ \mathfrak{S}\_{\ell}^{hist} & \approx & \mathfrak{S}\_{\ell,t}^{hist} & \approx & \mathfrak{S}\_{\ell,v}^{hist} \end{array}$$

**Fig. 3.** Hierarchy of classic scheduler classes **Fig. 4.** Non-prophetic classes

#### **4.1 The Classic Hierarchy**

We first establish that all classic history-dependent scheduler classes are equivalent:

**Proposition 1.** S*hist* ,v,e <sup>≈</sup> <sup>S</sup>*hist* ,t,e <sup>≈</sup> <sup>S</sup>*hist* ,e *.*

*Proof.* From the transition labels in <sup>A</sup> <sup>=</sup> <sup>A</sup> <sup>R</sup><sup>+</sup> in the history (S <sup>×</sup> <sup>A</sup> )∗, with S ∈ { S, S|,t,e, S|,e } depending on the scheduler class, we can reconstruct the total elapsed time as well as the values of all clocks: to obtain the total elapsed time, sum the labels in R<sup>+</sup> up to each state; to obtain the values of all clocks, do the same per clock and perform the resets of the edges identified by the actions.

The same argument applies among the expiration-order history-dependent classes:

#### **Proposition 2.** S*hist* ,v,o <sup>≈</sup> <sup>S</sup>*hist* ,t,o <sup>≈</sup> <sup>S</sup>*hist* ,o *.*

However, the expiration-order history-dependent schedulers are strictly less powerful than the classic history-dependent ones:

#### **Proposition 3.** S*hist* ,v,e <sup>S</sup>*hist* ,v,o*.*

*Proof.* Consider the SA M<sup>1</sup> in Fig. 5. Note that the history does not provide any information for making the choice in 1: we always arrive after having spent zero time in <sup>0</sup> and then having taken the single edge to 1. We can analytically determine that Pmax(S*hist* ,v,e) = <sup>3</sup> <sup>4</sup> by going from <sup>1</sup> to <sup>2</sup> if <sup>e</sup>(x) <sup>≤</sup> <sup>1</sup> <sup>2</sup> and to <sup>3</sup> otherwise. We would obtain a probability equal to <sup>1</sup> <sup>2</sup> by always going to either <sup>2</sup> or <sup>3</sup> or by picking either edge with equal probability. This is the best we can do if e is not visible, and thus Pmax(S*hist* ,v,o) = <sup>1</sup> <sup>2</sup> : in 1, v(x) = v(y) = 0 and the expiration order is always "y before x" because y has not yet been started.

Just like for MDP and unbounded reachability probabilities, the classic historydependent and memoryless schedulers with complete information are equivalent:

**Proposition 4.** S*hist* ,v,e <sup>≈</sup> <sup>S</sup>*ml* ,v,e*.*

*Proof sketch.* Our definition of TPTS only allows finite nondeterministic choices, i.e. we have a very restricted form of continuous-space MDP. We can thus adapt the argument of the corresponding proof for MDP [5, Lemma 10.102]: For each state (of possibly countably many), we construct a notional optimal memoryless (and deterministic) scheduler in the same way, replacing the summation by an integration for the continuous measures in the transition function. It remains to show that this scheduler is indeed measurable. For TPTS that are the semantics of SA, this follows from the way clock values are used in the guard sets so that optimal decisions are constant over intervals of clock values and expiration times (see e.g. the arguments in [12] or [30]).

On the other hand, when restricting schedulers to see the expiration order only, history-dependent and memoryless schedulers are no longer equivalent:

**Proposition 5.** S*hist* ,v,o <sup>S</sup>*ml* ,v,o*.*

*Proof.* Consider the SA M<sup>2</sup> in Fig. 6. Let s *opt ml*(l,v,o) be the (unknown) optimal scheduler in S*ml* ,v,o w.r.t. the max. probability of reaching ✓. Define s*better hist*(l,v,o) ∈ S*hist* ,v,o as: when in <sup>2</sup> and the last edge in the history is the left one (i.e. x is expired), go to 3; otherwise, behave like s *opt ml*(l,v,o). This scheduler distinguishes S*hist* ,v,o and S*ml* ,v,o (by achieving a strictly higher max. probability than s *opt ml*(l,v,o)) if and only if there are some combinations of clock values (aspect v) and expiration orders (aspect o) in <sup>2</sup> that can be reached with positive probability via the left edge into 2, for which s *opt ml*(l,v,o) must nevertheless decide to go to 4.

All possible clock valuations in <sup>2</sup> can be achieved via either the left or the right edge, but taking the left edge implies that x expires before z in 2. It is thus sufficient to show that s *opt ml*(l,v,o) must go to <sup>4</sup> in *some* cases where x expires before z. The general form of schedulers in S*ml* ,v,o in <sup>2</sup> is "go to <sup>3</sup> iff (a) x expires before z and v(x) ∈ S<sup>1</sup> or (b) z expires before x and v(x) ∈ S2" where the S<sup>i</sup> are measurable subsets of [0, 8]. S<sup>2</sup> is in fact *irrelevant*: whatever s *opt ml*(l,v,o) does when (b) is satisfied will be mimicked by s*better hist*(l,v,o) because z can only expire before x when coming via the right edge into 2. Conditions (a) and (b) are independent.

With S<sup>1</sup> = [0, 8], the max. probability is <sup>77</sup> <sup>96</sup> = 0.80208¯3. Since this is the only scheduler in S*ml* ,v,o that is *relevant* for our proof and never goes to l<sup>4</sup> when x expires before z, it remains to show that the max. probability under s *opt ml*(l,v,o) is > <sup>77</sup> <sup>96</sup> . With <sup>S</sup><sup>1</sup> = [0, <sup>35</sup> <sup>12</sup> ), we have a max. probability of <sup>7561</sup> <sup>9216</sup> ≈ 0.820421. Thus s *opt ml*(l,v,o) must sometimes go to l<sup>4</sup> even when the left edge was taken, so s*better hist*(l,v,o) achieves a higher probability and thus distinguishes the classes.

Knowing only the global elapsed time is less powerful than knowing the full history or the values of all clocks:

#### **Proposition 6.** S*hist* ,t,e <sup>S</sup>*ml* ,t,e *and* S*ml* ,v,e <sup>S</sup>*ml* ,t,e*.*

*Proof sketch.* Consider the SA M<sup>3</sup> in Fig. 7. We have Pmax(S*hist* ,t,e) = 1: when in 3, the scheduler sees from the history which of the two incoming edges was used, and thus knows whether x or y is already expired. It can then make the optimal choice: go to <sup>4</sup> if x is already expired, or to <sup>5</sup> otherwise. We also have Pmax(S*ml* ,v,e) = 1: the scheduler sees that either v(x) = 0 or v(y) = 0, which implies that the other clock is already expired, and the argument above applies. However, Pmax(S*ml* ,t,e) < 1: the distribution of elapsed time t on entering <sup>3</sup> is itself independent of which edge is taken. With probability <sup>1</sup> <sup>4</sup> , exactly one of e(x) and e(y) is below t in 3, which implies that that clock has just expired and thus the scheduler can decide optimally. Yet with probability <sup>3</sup> <sup>4</sup> , the expiration times are not useful: they are both positive and drawn from the same distribution, but one unknown clock is expired. The wait for x in <sup>1</sup> ensures that comparing t with the expiration times in e does not reveal further information in this case.

In the case of MDP, knowing the total elapsed time (i.e. steps) does not make a difference for unbounded reachability. Only for step-bounded properties is that extra knowledge necessary to achieve optimal probabilities. With SA, however, it makes a difference even in the unbounded case:

#### **Proposition 7.** S*ml* ,t,e <sup>S</sup>*ml* ,e*.*

*Proof.* Consider SA M<sup>4</sup> in Fig. 8. We have Pmax(S*ml* ,t,e) = 1: in 2, the remaining time until y expires is e(y) and the remaining time until x expires is e(x) − t for the global time value t as <sup>2</sup> is entered. The scheduler can observe all of these quantities and thus optimally go to <sup>3</sup> if x will expire first, or to <sup>4</sup> otherwise. However, Pmax(S*ml* ,e) < 1: e(x) only contains the absolute expiration time of x, but without knowing t or the expiration time of z in 1, and thus the current value v(x), this scheduler cannot know with certainty which of the clocks will expire first and is therefore unable to make an optimal choice in 2.

Finally, we need to compare the memoryless schedulers that see the clock expiration times with memoryless schedulers that see the expiration order. As noted in Sect. 3.1, these two views of the current state are incomparable unless we also see the clock values:

#### **Proposition 8.** S*ml* ,v,e <sup>S</sup>*ml* ,v,o*.*

*Proof.* S*ml* ,v,e <sup>S</sup>*ml* ,v,o follows from the same argument as in the proof of Proposition 3. S*ml* ,v,e - S*ml* ,v,o is because knowing the current clock values v and the expiration times e is equivalent to knowing the expiration order, since that is precisely the order of the differences e(c) − v(c) for all clocks c.

#### **Proposition 9.** S*ml* ,t,e ≈ <sup>S</sup>*ml* ,t,o*.*

*Proof.* S*ml* ,t,e <sup>S</sup>*ml* ,t,o follows from the same argument as in the proof of Proposition 3. For S*ml* ,t,e - S*ml* ,t,o, consider the SA M<sup>3</sup> of Fig. 7. We know from the proof of Proposition 6 that Pmax(S*ml* ,t,e) < 1. However, if the scheduler knows the order in which the clocks will expire, it knows which one has already expired (the first one in the order), and can thus make the optimal choice in <sup>3</sup> to achieve Pmax(S*ml* ,t,o) = 1.

#### **Proposition 10.** S*ml* ,e ≈ <sup>S</sup>*ml* ,o*.*

*Proof.* The argument of Proposition 9 applies by observing that, in M<sup>3</sup> of Fig. 7, we also have Pmax(S*ml* ,e) < 1 via the same argument as for S*ml* ,t,e in the proof of Proposition 6.

Among the expiration-order schedulers, the hierarchy is as expected:

**Proposition 11.** S*ml* ,v,o <sup>S</sup>*ml* ,t,o <sup>S</sup>*ml* ,o*.* *Proof sketch.* Consider M<sup>5</sup> of Fig. 9. To maximise the probability, in <sup>3</sup> we should go to <sup>4</sup> whenever x is already expired or close to expiring, for which the amount of time spent in <sup>2</sup> is an indicator. S*ml* ,o only knows that x may have expired when the expiration order is "x before y", but definitely has not expired when it is "y before x". Schedulers in S*ml* ,t,o can do better: They also see the amount of time spent in 2. Thus S*ml* ,t,o <sup>S</sup>*ml* ,o. If we modify M<sup>5</sup> by adding an initial delay on x from a new <sup>0</sup> to <sup>1</sup> as in M3, then the same argument can be used to prove S*ml* ,v,o <sup>S</sup>*ml* ,t,o: the extra delay makes knowing the elapsed time t useless with positive probability, but the exact time spent in l<sup>2</sup> is visible to S*ml* ,v,o as v(x).

We have thus established the hierarchy of classic schedulers shown in Fig. 3, noting that some of the relationships follow from the propositions by transitivity.

#### **4.2 The Non-prophetic Hierarchy**

Each non-prophetic scheduler class is clearly dominated by the classic and expiration-order scheduler classes that otherwise have the same information, for example S*hist* ,v,e <sup>S</sup>*hist* ,v (with very simple distinguishing SA). We show that the non-prophetic hierarchy follows the shape of the classic case, including the difference between global-time and pure memoryless schedulers, with the notable exception of memoryless schedulers being weaker than history-dependent ones.

**Proposition 12.** S*hist* ,v <sup>≈</sup> <sup>S</sup>*hist* ,t <sup>≈</sup> <sup>S</sup>*hist .*

*Proof.* This follows from the argument of Proposition 1.

#### **Proposition 13.** S*hist* ,v <sup>S</sup>*ml* ,v*.*

*Proof.* Consider the SA M<sup>6</sup> in Fig. 10. It is similar to M<sup>4</sup> of Fig. 8, and our arguments are thus similar to the proof of Proposition 7. On M6, we have Pmax(S*hist* ,v ) = 1: in 2, the history reveals which of the two incoming edges was used, i.e. which clock is already expired, thus the scheduler can make the optimal choice. However, if neither the history nor e is available, we get Pmax(S*ml* ,v) = <sup>1</sup> 2 : the only information that can be used in <sup>2</sup> are the values of the clocks, but v(x) = v(y), so there is no basis for an informed choice.

**Proposition 14.** S*hist* ,t <sup>S</sup>*ml* ,t *and* S*ml* ,v <sup>S</sup>*ml* ,t *.*

*Proof.* Consider the SA M<sup>3</sup> in Fig. 7. We have Pmax(S*hist* ,t )=Pmax(S*ml* ,v) = 1, but Pmax(S*ml* ,t ) = <sup>1</sup> <sup>2</sup> by the same arguments as in the proof of Proposition 6.

#### **Proposition 15.** S*ml* ,t <sup>S</sup>*ml .*

*Proof.* Consider the SA M<sup>4</sup> in Fig. 8. The schedulers in S*ml* have no information but the current location, so they cannot make an informed choice in 2. This and the simple loop-free structure of M<sup>4</sup> make it possible to analytically calculate the resulting probability: Pmax(S*ml* ) = <sup>17</sup> <sup>24</sup> = 0.7083. If information about the global elapsed time t in <sup>2</sup> is available, however, the value of x is revealed. This allows making a better choice, e.g. going to <sup>3</sup> when <sup>t</sup> <sup>≤</sup> <sup>1</sup> <sup>2</sup> and to <sup>4</sup> otherwise, resulting in Pmax(S*ml* ,t ) ≈ 0.771 (statistically estimated with high confidence).

We have thus established the hierarchy of non-prophetic schedulers shown in Fig. 4, where some relationships follow from the propositions by transitivity.

### **5 Experiments**

We have built a prototype implementation of lightweight scheduler sampling for SA by extending the Modest Toolset's [24] modes simulator, which already supports deterministic stochastic timed automata (STA [8]). With some care, SA can be encoded into STA. Using the original algorithm for MDP of [18], our prototype works by providing to the schedulers a discretised view of the continuous components of the SA's semantics, which, we recall, is a continuousspace MDP. The currently implemented discretisation is simple: for each realvalued quantity (the value v(c) of clock c, its expiration time e(c), and the global elapsed time t), it identifies all values that lie within the same interval [ <sup>i</sup> <sup>n</sup> , <sup>i</sup>+1 <sup>n</sup> ), for integers i, n. We note that better static discretisations are almost certainly possible, e.g. a region construction for the clock values as in [30].

We have modelled M<sup>1</sup> through M<sup>6</sup> as STA in Modest. For each scheduler class and model in the proof of a proposition, and discretisation factors n ∈ { 1, 2, 4 }, we sampled 10 000 schedulers and performed statistical model checking for each of them in the lightweight manner. In Fig. 11 we report the min. and max. estimates, (ˆpmin, pˆmax)..., over all sampled schedulers. Where different discretisations lead to different estimates, we report the most extremal values. The subscript denotes the discretisation factors that achieved the reported estimates. The analysis for each sampled scheduler was performed with a number of simulation runs sufficient for the overall max./min. estimates to be within ± 0.01 of the true maxima/minima of the *sampled* set of schedulers with probability ≥0.95 [18]. Note that ˆpmin is an upper bound on the true minimum probability and ˆpmax is a lower bound on the true maximum probability.

Increasing the discretisation factor or increasing the scheduler power generally increases the number of decisions the schedulers *can* make. This may also increase the number of *critical* decisions a scheduler *must* make to achieve the extremal probability. Hence, the sets of discretisation factors associated to specific experiments may be informally interpreted in the following way:



**Fig. 11.** Results from the prototype of lightweight scheduler sampling for SA

The results in Fig. 11 respect and differentiate our hierarchy. In most cases, we found schedulers whose estimates were within the statistical error of calculated optima or of high confidence estimates achieved by alternative statistical techniques. The exceptions involve M<sup>3</sup> and M4. We note that M<sup>4</sup> makes use of an additional clock, increasing the dimensionality of the problem and potentially making near-optimal schedulers rarer. The best result for M<sup>3</sup> and class S*ml* l,t,e was obtained using discretisation factor n = 2: a compromise between nearness to optimality and rarity. A greater compromise was necessary for M<sup>4</sup> and classes S*ml* l,t,e, S*ml* l,e , where we found near-optimal schedulers to be very rare and achieved best results using discretisation factor n = 1.

The experiments demonstrate that lightweight scheduler sampling can produce useful and informative results with SA. The present theoretical results will allow us to develop better abstractions for SA and thus to construct a refinement algorithm for efficient lightweight verification of SA that will be applicable to realistically sized case studies. As is, they already demonstrate the importance of selecting a proper scheduler class for efficient verification, and that restricted classes are useful in planning scenarios.

### **6 Conclusion**

We have shown that the various notions of information available to a scheduler class, such as history, clock order, expiration times or overall elapsed time, almost all make distinct contributions to the power of the class in SA. Our choice of notions was based on classic scheduler classes relevant for other stochastic models, previous literature on the character of nondeterminism in and verification of SA, and the need to synthesise simple schedulers in planning. Our distinguishing examples clearly expose how to exploit each notion to improve the probability of reaching a goal. For verification of SA, we have demonstrated the feasibility of lightweight scheduler sampling, where the different notions may be used to finely control the power of the lightweight schedulers. To solve stochastic timed planning problems defined via SA, our analysis helps in the case-by-case selection of an appropriate scheduler class that achieves the desired tradeoff between optimal probabilities and ease of implementation of the resulting plan.

We expect the arguments of this paper to extend to steady-state/frequency measures (by adding loops back from absorbing to initial states in our examples), and that our results for classic schedulers transfer to SA with delayable actions. We propose to use the results to develop better abstractions for SA, the next goal being a refinement algorithm for efficient lightweight verification of SA.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symbolically Quantifying Response Time in Stochastic Models Using Moments and Semirings**

Hugo Bazille<sup>1</sup>, Eric Fabre<sup>1</sup>, and Blaise Genest2(B)

<sup>1</sup> Univ Rennes, Inria, SUMO Team, Rennes, France <sup>2</sup> Univ Rennes, CNRS, IRISA, Rennes, France bgenest@irisa.fr

**Abstract.** We study quantitative properties of the response time in stochastic models. For instance, we are interested in quantifying bounds such that a high percentage of the runs answers a query within these bounds. To study such problems, computing probabilities on a statespace blown-up by a factor depending on the bound could be used, but this solution is not satisfactory when the bound is large.

In this paper, we propose a new *symbolic* method to quantify bounds on the response time, using the moments of the distribution of simple stochastic systems. We prove that the distribution (and hence the bounds) is uniquely defined given its moments. We provide *optimal* bounds for the response time over all distributions having a pair of these moments. We explain how to *symbolically* compute in polynomial time any moment of the distribution of response times using adequatelydefined semirings. This allows us to compute optimal bounds in parametric models and to reduce complexity for computing optimal bounds in hierarchical models.

### **1 Introduction**

Response time has been considered lately as an important property of systems [8,15,21]. In this context, one does not simply want a query to be answered eventually, but to be answered in a reasonable amount of time. In the model-checking community, problems on response time have been studied mainly *qualitatively*, in the context of (pure, that is non stochastic) two-player games [8,21]. There, one looks for a strategy ensuring that the lim-sup of response time is finite. It ensures that under this strategy, there will be a bound on the response time to any query. This has been extended in [15] to a quantitative setting, where one wants to optimize the mean response time in a pure two-player game.

In this paper, we consider stochastic systems. In such systems, the response time is a random variable, unlikely to be bounded as even a single probabilistic loop on a reachable state will make the response time longer than T for a set of runs of small but positive probability, no matter T. Instead, we propose to quantify such response times. One way to do that is to obtain the distribution of response times. Another way is to compute, for a probability 0 <p< 1, the bound T that is satisfied (by a set of runs) with probability at least 1 − p. In this paper, we tackle both problems. For that, we use the concept of *moments* of the distribution of response times, as described next.

The *moment of order* r of a probability distribution δ over R or R<sup>+</sup> is defined as the integral of xrδ(x) over the support of δ, when defined (that is if xrδ(x) is measurable and the integral is defined). For instance, the moment of order 1 is the expected value of δ, while the moment of order 2 allows one to compute the standard deviation of δ. Inspired by the computation of entropy for automata [10] (see also [1] for the computation of entropy for (non-Zeno) timed-automata), we design new semirings in which each moment corresponds to the sum of weights of runs reaching a state. This construction can be applied to probabilistic automata (that is, labeled discrete time Markov chains), as well as labeled *continuous time Markov chains*, where time is continuous and is drawn according to some rate. Adapting the Floyd-Warshall algorithm provides a *symbolic* way to perform the computation of the n first moments in time cubic in the number of states of the Markov Chain, and quadratic in n. For any n, we can thus compute the value of the first n moments. In some sense, we extend the approach of [12,16] from computing probabilities to computing any moments. This allows us to evaluate the distribution of response times in two ways:

Firstly, thanks to the symbolic expression of moments, we prove that there is a unique distribution having the moments of a distribution of response times of a probabilistic automaton. We can then build a sequence of distributions matching the first n moments, for instance the maximal entropy one [11]. Here, maximal entropy means assuming the least information besides these moments. This sequence of distributions is then ensured to converge in law towards the distribution of response times.

Secondly, we study optimal symbolic bounds on the time to answer a high percentage of queries, obtained from moments. The Tchebychev inequality provides optimal symbolic bounds when considering the space of distributions having one given moment, of any order i. We obtain bounds optimal in the space of distributions having two given moments, of any orders i, j. We show how this improves Tchebychev bounds on some example. Having symbolic methods allows for instance to deal with parametric systems where the parameters represent uncertain probabilities. In this case, we can compute optimal bounds satisfying all valuations of parameters. For hierarchical systems [3], which are compact representations of large systems, our symbolic method allows to design a much more efficient algorithm (e.g. it does not consider twice the same component) to compute the moments, and thus the bounds. Missing proofs can be found in [5].

*Related Work*: Response times in stochastic systems have been studied for a long time by the perf.eval. community under the name "first passage times", e.g. in [22]. Techniques used in this community to compute moments of Markov chains are mostly based on numerical methods, e.g. [13]. While [13] has the same complexity as our symbolic technique, it is very efficient on explicit models. However, these numerical methods are less adaptable than our symbolic algorithm, in particular concerning parametric or hierarchical systems.

Concerning the determinacy of the distribution given moments, it is known [20] that phase-type distributions of order n are determined by their first 2n − 1 moments. First passage distribution time in Markov chains with n states are phase type distribution of order n. However, [20] does not help characterizing bounds as it does not ensure that a non-phase type distribution cannot have the exact same moments as a phase type distribution, unlike our result.

Bounding the response time has also been studied in the perf.eval. community. Again, methods used there are mostly numerical [6,19]. In [19] (pp. 68–69), a symbolic bound is also provided in the particular case of moments of order 1, 2 and 3. In [2], it is shown how to use the two first moments of response time across various components to compute general bounds, using techniques close to ours, but restricted to moments of order 1 and 2. In our paper, we provide *optimal* bounds for any order (i, j) <sup>∈</sup> <sup>N</sup><sup>2</sup>. Taking into account moments of order i, j > 3 is important when the proportion of runs to answer is close to 1.

Last, computing moments find other applications. For instance, in [4,7,14], complex functions describing the evolution of molecular species are approximated using the first k moments, for some k.

### **2 Probabilistic Automata**

We first introduce a simple class of models, namely *probabilistic automata* (also called *labeled discrete time Markov chains*), on which we can demonstrate our techniques. Later, we will extend our results to handle continuous time, considering Continuous-Time Markov Chains (CTMC), as well as parametric and hierarchical systems.

**Definition 1.** *A probabilistic automaton* A *over a finite alphabet* Σ *is a tuple* (S, P r, δ0) *where:*


*Example 1.* For instance, the model depicted on Fig. 1 is a probabilistic automaton with 3 states {1, <sup>2</sup>, <sup>3</sup>}. There is a transition between 1 and 2 labeled **query** with probability 1. From state 2, with probability .9 we stay in state 2 with a transition labeled **wait**, and with probability .1 we go to state 3 with a transition labeled **response**. We loop in state 3 with probability 1.

**Fig. 1.** A simple example of a query-response model

A finite sequence <sup>π</sup> <sup>=</sup> <sup>s</sup>0, a1, s1,...,an, sn <sup>∈</sup> (SΣ)n<sup>S</sup> is called a *finite path* starting from <sup>s</sup><sup>0</sup> and ending in <sup>s</sup>n, and a transition <sup>t</sup> <sup>∈</sup> <sup>π</sup> if <sup>t</sup> <sup>=</sup> <sup>s</sup>iai+1si+1 for some i. We denote |π| = n the length of the path π. For a path π<sup>1</sup> ending in <sup>s</sup>n and a path <sup>π</sup><sup>2</sup> starting from <sup>s</sup>n, we can define the concatenated path <sup>π</sup><sup>1</sup> · <sup>π</sup><sup>2</sup> where the last node of π<sup>1</sup> and the first node of π<sup>2</sup> are merged. A path π<sup>1</sup> is a *prefix* of π if there exists a path π<sup>2</sup> such that π<sup>1</sup> · π<sup>2</sup> = π.

For a path π starting in a state s0, we define P(π) = t∈π P r(t) the probability

that a path with prefix π is executed from s0. A path π is realizable if P(π) > 0. Let s be a state, and Π be a set of finite paths starting from s such that no path in Π is a prefix of another path in Π. Then the probability that a path starting from s has a prefix in Π is P(Π) = - P(ρ). We say that Π is

ρ∈Π *disjoint* if no path ρ of Π is a prefix of another path ρ = ρ of Π or similarly, Cyl(ρ) ∩ Cyl(ρ ) = ∅ with Cyl(ρ) = {π, ρ prefix of π}.

Some labels of an automaton will be of particular interest concerning response time. Let <sup>Σ</sup>Q <sup>⊆</sup> <sup>Σ</sup> be a subset of labels standing for queries, and <sup>Σ</sup>R <sup>⊆</sup> <sup>Σ</sup> be a subset of labels standing for responses. For simplicity, we will assume that there is a unique query type <sup>Σ</sup>Q <sup>=</sup> {q} and a unique response type <sup>Σ</sup>R <sup>=</sup> {r}, with q = r. We will also assume that there is no path with two (similar) queries q. To handle cases with several query/response types, it suffices for each type to consider only queries and answers of that type and disregard other types.

**Problem Statement:** We are interested in quantifying the time between queries and responses, called the *response time*, which is a random variable. A way to quantify it is to produce the distribution of response times, either for each transition labeled by a query, or averaged on these transitions, weighted by the probability to see each of these transitions. Another way is to answer modelchecking questions such as: what is the smallest delay T such that the mass of paths unanswered after T units of time is smaller than some probability p?

To compute both the distribution and the delay T, we will use the so called *moments of the distribution of response times*. The moment of order 1 is the mean value, and the moment of order 2 allows to compute the standard deviation.

### **3** *Symbolically* **Computing Moments Using Semirings**

In this section, we define moments and explain how to compute them *symbolically* using appropriately-defined semirings.

Let X be the random variable of the response time. If all queries are answered, then <sup>X</sup> takes values in <sup>N</sup>max, else <sup>X</sup> takes values in <sup>N</sup>max ∪ {∞}. Let <sup>p</sup>(x) be the probability that the response is obtained x units of time after the query, that is, the probability that X = x. Variable p is a distribution over response time, with - x <sup>p</sup>(x) = 1.


#### **3.1 Semirings Associated with Moments**

We will compute moments of the distribution of response times by considering each query individually. We can then take e.g. the average over all queries (as we assumed that there are no two queries on the same path). Thus, we first fix a state q, target of a transition labeled by a query. State q symbolizes that a query has just been asked. We then let R be the set of target states of transitions labeled by a response. A state is in R if a response to this query has just been given. For instance, on Fig. 1, we have q = 2 and R = {3}.

We introduce a set of semirings that will allow us to compute symbolically the moment of order n of the distribution of response times to the query associated with state <sup>q</sup>, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We will compute the moment inductively on a disjoint subset - <sup>Π</sup> of paths of <sup>A</sup> from <sup>q</sup> to <sup>R</sup>. For an integer <sup>n</sup>, we denote <sup>μ</sup>n(Π) = ρ∈Π <sup>P</sup>(ρ)|ρ<sup>|</sup> <sup>n</sup>. Let **Path**<sup>R</sup> q be the set of paths in the automaton <sup>A</sup> between <sup>q</sup> and

the first occurrence of R. Notice that **Path**<sup>R</sup> q is disjoint. Thus, we have that <sup>μ</sup>n(**Path**<sup>R</sup> q ) is the moment of order <sup>n</sup> of the distribution of response times to the query associated with state q. To avoid some heavy notations, when R is reduced to one state <sup>t</sup>, let <sup>μ</sup>n(**Path**<sup>t</sup> s) be the set of paths between <sup>s</sup> to the first occurrence of <sup>t</sup> and we denote <sup>μ</sup>n(s, t) = <sup>μ</sup>n(**Path**<sup>t</sup> s).

We now give some properties of μ. Let Π<sup>1</sup> be a set of paths ending in some state s and let Π<sup>2</sup> be a set of paths starting from s. We denote by Π<sup>1</sup> · Π<sup>2</sup> the set of paths ρ1ρ<sup>2</sup> with ρ<sup>1</sup> ∈ Π<sup>1</sup> and ρ<sup>2</sup> ∈ Π2.

$$\textbf{Proposition 1.}\text{ For all }n\text{, we have }\mu\_n(\boldsymbol{\Pi}\_1\cdot\boldsymbol{\Pi}\_2) = \sum\_{i=0}^n \binom{n}{i} \mu\_i(\boldsymbol{\Pi}\_1)\cdot\mu\_{n-i}(\boldsymbol{\Pi}\_2)$$

This property hints to a set of semirings (R, <sup>⊕</sup>n, <sup>⊗</sup>n, <sup>0</sup>n, <sup>1</sup>n) with good properties to compute moments. For (<sup>n</sup> + 1)-tuples (x0,...,xn) and (y0,...,yn), we define operations <sup>⊕</sup>n and <sup>⊗</sup>n:

$$\begin{aligned} \text{- } (x\_0, \dots, x\_n) \oplus\_n (y\_0, \dots, y\_n) &= (x\_0 + y\_0, \dots, x\_n + y\_n) \\ \text{- } (x\_0, \dots, x\_n) \otimes\_n (y\_0, \dots, y\_n) &= (z\_0, \dots, z\_n) \text{ with } z\_i = \sum\_{j=0}^i \binom{i}{j} x\_j y\_{i-j} \end{aligned}$$

The neutral element for <sup>⊕</sup>n is <sup>0</sup>n = (0,..., 0). <sup>0</sup>n is an annihilator for <sup>⊗</sup>n. The neutral element for <sup>⊗</sup>n is <sup>1</sup>n = (1, <sup>0</sup>,..., 0). In the following, we will denote the different laws and elements by ⊕, ⊗, 0 and 1.

### **Proposition 2.** *For* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*,* (Rn+1, <sup>⊕</sup>, <sup>⊗</sup>, <sup>0</sup>, 1) *defines a commutative semiring.*

Notice that if for all <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we have <sup>x</sup>i <sup>=</sup> <sup>μ</sup>i(Π1) and <sup>y</sup>i <sup>=</sup> <sup>μ</sup>i(Π2), denoting (z0,...,zn)=(x0,...,xn) <sup>⊗</sup>n (y0,...,yn), we get <sup>μ</sup>i(Π<sup>1</sup> · <sup>Π</sup>2) = <sup>z</sup>i. Further, if both Π1, Π<sup>2</sup> are disjoint, and if no path of Π<sup>1</sup> (resp. Π2) is a prefix of a path of <sup>Π</sup><sup>2</sup> (resp. <sup>Π</sup>1), then <sup>μ</sup>i(Π<sup>1</sup> <sup>∪</sup> <sup>Π</sup>2) = <sup>x</sup>i <sup>+</sup> <sup>y</sup>i.

#### **3.2 Computations in a Semiring**

Following the Floyd-Warshall algorithm to sum weights of paths reaching a state, we will decompose inductively **Path**<sup>R</sup> q using operations <sup>∪</sup> and ·. We will then use the semiring (Rn+1, <sup>⊕</sup>, <sup>⊗</sup>, <sup>0</sup>, 1) to perform these computations inductively. The induction will be over the number of states in S. Let G be a subset of S disjoint with <sup>R</sup>: <sup>G</sup> <sup>∩</sup> <sup>R</sup> <sup>=</sup> <sup>∅</sup>. For all state <sup>s</sup> <sup>∈</sup> <sup>S</sup> \ <sup>R</sup>, we define **Path**<sup>t</sup> s(G) = {s<sup>0</sup> ··· <sup>s</sup><sup>n</sup> <sup>|</sup> <sup>s</sup><sup>0</sup> <sup>=</sup> s, sn <sup>=</sup> t, <sup>∀</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> <sup>−</sup> <sup>1</sup>, si <sup>∈</sup> <sup>G</sup>} the set of paths from state <sup>s</sup> to state <sup>t</sup> using only states G, except for the initial state, which is s and for the last state which is t, even if s, t ∈ R or s, t /∈ G.

For a set of paths <sup>Π</sup>, we define <sup>w</sup>n(Π)=(P(Π), μ1(Π),...,μn(Π)). Let <sup>g</sup> <sup>∈</sup> <sup>G</sup> be a state of <sup>G</sup>. A path <sup>ρ</sup> in **Path**<sup>t</sup> s(G) has two possibilities: either it does not use g, or it uses g one or several times. We deduce the inductive formula:

$$\begin{aligned} \textbf{Proposition 3.}\ w\_n(\textbf{Path}\_s^t(G)) &= \newline w\_n(\textbf{Path}\_s^t(G \mid \{g\})) \oplus \\ w\_n(\textbf{Path}\_s^g(G \mid \{g\})) &\otimes \left(\bigoplus\_{k=1}^{\infty} w\_n(\textbf{Path}\_g^g(G \mid \{g\}))^{\otimes k}\right) \otimes w\_n(\textbf{Path}\_g^t(G \mid \{g\})) \end{aligned}$$

*Proof (Sketch of ).* If ρ does not use g, we have ρ is in **Path**<sup>t</sup> s(G\{g}). Otherwise, <sup>ρ</sup> can be expressed as <sup>ρ</sup><sup>0</sup> ...ρk with:


We can then write an inductive formula satisfied by **Path**<sup>t</sup> s(G):

$$\begin{aligned} \mathbf{Path}\_s^t(\emptyset) &= \{ (s, a, t) \mid Pr(s, a, t) \neq 0 \} \\ \mathbf{Path}\_s^t(G) &= \mathbf{Path}\_s^t(G \nmid \{g\}) \cup \bigcup\_{k=1}^{\infty} \{ \rho\_{0 \cdots \rho\_k} \mid \rho\_0 \in \mathbf{Path}\_s^g(G \nmid \{g\}), \\ \rho\_k &\in \mathbf{Path}\_g^t(G \nmid \{g\}), \forall j \in [1, k-1], \rho\_j \in \mathbf{Path}\_g^g(G \nmid \{g\}) \} \end{aligned}$$

In order to use this formula, we need to compute ∞ k=1 <sup>w</sup>n(**Path**<sup>g</sup> g(<sup>G</sup> \ {g}))<sup>⊗</sup><sup>k</sup> <sup>=</sup> <sup>w</sup>n(**Path**<sup>g</sup> g(G)), which represents what happens along a cycle from <sup>g</sup> to <sup>g</sup>. Let (g,Π) a pair with g a state and Π a set of paths (cycles) using g exactly twice: the first state and the last states are g. The pair (g, **Path**<sup>g</sup> g(<sup>G</sup> \ {g})) satisfies this property. We define w<sup>∗</sup> n(Π) = ∞ k=1 <sup>w</sup>n(Π)<sup>⊗</sup>k. The restriction on (r, Π) ensures that ∞ k=1 Π<sup>⊗</sup><sup>k</sup> is disjoint. We show that w<sup>∗</sup> n(Π) is defined in most cases, namely when P(Π) < 1.

**Proposition 4.** *Let* Π *be a set of paths using state* g *exactly twice, as first and last state. If* P(Π) < 1*, then*

$$w\_n^\*(\varPi)[0] = w\_0^\*(\varPi) = \mathbb{P}(\bigcup\_{k=1}^{\infty} \varPi^{\otimes k}) = \frac{1}{1 - \mathbb{P}(\varPi)}, \text{ and for } i > 0$$

$$w\_n^\*(\varPi)[i] = \mu\_i(\bigcup\_{k=1}^{\infty} \varPi^{\otimes k}) = \frac{1}{1 - \mathbb{P}(\varPi)} \sum\_{j=0}^{i-1} \binom{i}{j} \ w\_n(\varPi)[i-j] \times w\_n^\*(\varPi)[j]$$

Notice that P(Π) = 1 describes cases where s cannot reach t (as t /∈ G, if <sup>P</sup>(wn(**Path**<sup>g</sup> g(G)) = 1, it would mean that every path reaching <sup>g</sup> stays in G forever, and in particular never meets t). Thus, we first compute the set of states S<sup>1</sup> from which there exists a path to R. Notice that for each set Π of paths ending in <sup>g</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> \ <sup>R</sup>, we have <sup>P</sup>(Π) <sup>&</sup>lt; 1, because there is a positive probability to reach R from g, which is not captured by paths in Π.

#### **3.3 A Symbolic Algorithm**

From the inductive formulae to compute set of paths from subsets of paths and to compute w<sup>∗</sup> n(Π)[i] from <sup>w</sup><sup>∗</sup> n(Π)[j] for j<i, we deduce Algorithm 1, following the ideas of Floyd-Warshall, incrementally adding non response states from S<sup>1</sup> \ R, which can be used as intermediate states. Notice that states in S \ S<sup>1</sup> cannot reach R anyway. This algorithm is *symbolic* (or *algebraic*) in that every constant (e.g. P r(s, a, t)) can be replaced by a variable (see e.g. Sect. 4.2).

**Theorem 1.** *Let* A = (S, δ, δ0) *be a probabilistic automaton. One can compute* <sup>μ</sup>i(s, t) *for all* <sup>i</sup> <sup>≤</sup> <sup>n</sup> *and* s, t <sup>∈</sup> <sup>S</sup> *in time* <sup>O</sup>(n<sup>2</sup> × |S<sup>|</sup> <sup>3</sup>)*.*

*Proof.* In Algorithm 1, after running the outer **for**-loop on <sup>g</sup>1,...,gj , we have <sup>w</sup>n(s, t)[n] = <sup>μ</sup>n(**Path**<sup>t</sup> s({g1,...,gj})). At the end of Algorithm 1, we obtain <sup>w</sup>n(s, t)[n] = <sup>μ</sup>n(**Path**<sup>t</sup> s) = <sup>μ</sup>n(s, t).

#### **Algorithm 1:** Algorithm computing the moment of order n

```
for s ∈ S do
    for t ∈ S do
        %Initialization
        w := -

             a∈Σ
                 P r(s, a, t)
        wn(s, t) := (w, w, . . . , w)
    end
end
for g ∈ S1 \ R do
    for s ∈ S do
        for t ∈ S do
            wn(s, t) := wn(s, t) ⊕ wn(s, g) ⊗ w∗
                                                n(g, g) ⊗ wn(g, t)
        end
    end
end
```
To obtain <sup>μ</sup>i(s, t) for all <sup>i</sup> <sup>≤</sup> <sup>n</sup>, it suffices to run Algorithm <sup>1</sup> inductively on moment of order 1,...,n. Computing w<sup>∗</sup> n[i](s, t) in the inner **for**-loop takes time <sup>O</sup>(i) as <sup>w</sup>n[j](s, t) = <sup>w</sup>j [j](s, t) has already been computed inductively for all j<i. This yields the complexity of O( n j=1 <sup>i</sup> × |S<sup>|</sup> <sup>3</sup>) = <sup>O</sup>(n<sup>2</sup> × |S<sup>|</sup> <sup>3</sup>).

Now, for each query <sup>q</sup>, we have <sup>μ</sup>i(**Path**<sup>R</sup> q ) = - r∈R <sup>μ</sup>i(q, r), as **Path**r<sup>1</sup> q and **Path**r<sup>2</sup> q have no path prefix of each other for <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>r</sup>2, r1, r<sup>2</sup> <sup>∈</sup> <sup>R</sup>. Now, the moment of order n of the distribution of response times of q is formally either ∞ if μ0(**Path**<sup>R</sup> q ) <sup>&</sup>lt; 1 (there is positive probability to never answer <sup>q</sup>, that is have infinite response time), and <sup>μ</sup>n(**Path**<sup>R</sup> q ) otherwise.

*Example 2.* For the example of Fig. 1, unfolding the algorithm for n = 2 (that is for probability, and moments of order 1 and 2) gives after initialization:

w(1, 2) = (1, 1, 1), w(2, 2) = (0.9, 0.9, 0.9), w(2, 3) = (0.1, 0.1, 0.1), and w(1, 3) = (0, 0, 0), as there is no direct transition from state 1 to state 3.

There are no paths with intermediary states 1 or 3, so g = 1 or g = 3 does not have any impact. For paths with intermediary states g = 2, the algorithm gives:

– w(2, 2) ← w(2, 2) ⊕ w(2, 2) ⊗ w(2, 2)<sup>∗</sup> ⊗ w(2, 2) = w(2, 2) ⊗ w(2, 2)<sup>∗</sup> – w(2, 3) ← w(2, 3) ⊕ w(2, 2) ⊗ w(2, 2)<sup>∗</sup> ⊗ w(2, 3) = w(2, 3) ⊗ w(2, 2)<sup>∗</sup> – w(1, 3) ← w(1, 3) ⊕ w(1, 2) ⊗ w(2, 2)<sup>∗</sup> ⊗ w(2, 3)

We have w(2, 2)<sup>∗</sup> = ( <sup>1</sup> <sup>1</sup>−0.<sup>9</sup> , <sup>0</sup>.<sup>9</sup> (1−0.9)<sup>2</sup> , <sup>0</sup>.<sup>9</sup> (1−0.9)<sup>2</sup> <sup>+</sup> <sup>2</sup>×0.9<sup>2</sup> (1−0.9)<sup>3</sup> ) = (10, <sup>90</sup>, 1710)

At the end of the algorithm, we obtain <sup>μ</sup>i(2, 3) = <sup>μ</sup>i(**Path**{2} <sup>2</sup> ) = w(2, 3) = (0.1, 0.1, 0.1)⊗(10, 90, 1710) = (1, 10, 190). Hence, in this probabilistic automata, the probability of responding to the query is 1, in a mean time of 10, with a standard deviation of <sup>√</sup> 190 − 10<sup>2</sup> = 9.5.

#### **3.4 Extension to Continuous Time**

We now extend the symbolic computation of moments to *continuous time Markov Chains (CTMCs)*. In order to be as close as possible to the setting of probabilistic automata, we use the sojourn time representation of CTMCs. This representation is fully equivalent with the more usual representation of CTMCs with transition rates, see Chap. 7.3 of [9].

**Definition 3.** *A CTMC is a tuple* (S, P r, δ0,(λs)s∈S) *with:*

*–* (S, P r, δ0) *is a probabilistic automata, and*

*– for all* <sup>s</sup>*,* <sup>λ</sup>s *is the sojourn parameter associated with state* <sup>s</sup>*. That is, the PDF function of the sojourn time is* <sup>X</sup>s(t) = <sup>λ</sup>se<sup>−</sup>λ*s*·<sup>t</sup> *and the probability to stay in* s *at least* t *units of time is* e<sup>−</sup>λ*s*·<sup>t</sup> *.*

In this continuous context, we need integrals instead of sums to define the <sup>i</sup>-th moment of a variable <sup>X</sup>: <sup>μ</sup>i(X) =  <sup>∞</sup> <sup>0</sup> X(t)t i dt = 1. For every state s ∈ S, let <sup>X</sup>s(t) = <sup>λ</sup>se<sup>−</sup>λ*s*·<sup>t</sup> . For all <sup>i</sup>, for all <sup>s</sup>, <sup>μ</sup>i(Xs) is well defined and <sup>μ</sup>i(Xs) = <sup>i</sup>! λ*i s*

We can easily extend the computation of moments for CTMCs. The inductive formulas for probabilities and moments of the reaching time distribution remain unchanged. We only need to change the definition of moments for every transition, which is input at the initialization phase of the Algorithm 1: for all s, t <sup>∈</sup> <sup>S</sup>, we set <sup>w</sup>n(s, t) to be (w<sup>0</sup>(s, t), w<sup>1</sup>(s, t),...,wn(s, t)), where w<sup>0</sup>(s, t) = - a∈Σ P r(s, a, t) and w<sup>i</sup> (s, t) = - a∈Σ P r(s, a, t) <sup>i</sup>! λ*i s* for all i ∈ [1, n].

**Theorem 2.** *Let* <sup>A</sup> = (S, P r, δ0,(λs)s∈S) *be a CTMC. One can compute* <sup>μ</sup>i(s, t) *for all* <sup>i</sup> <sup>≤</sup> <sup>n</sup> *and* s, t <sup>∈</sup> <sup>S</sup> *in time* <sup>O</sup>(n<sup>2</sup> × |S<sup>|</sup> <sup>3</sup>)*.*

### **4 Uniqueness of Distribution, Parameters and Hierarchy**

In this section, we present cases where having a symbolic algorithm allows efficient techniques, compared to numerical methods. We start with hierarchical systems which are a way to compactly describe systems. Then, we present the possibility to work on systems with parameters. Finally, thanks to the symbolic expression of moments, we prove that there is a unique distribution having the moments of a distribution of reaching times of a (continuous-time) Markov chain.

#### **4.1 Hierarchical Probabilistic Automata**

We use notations mainly from [3] to describe hierarchical structures:

**Definition 4.** *A hierarchical probabilistic automaton (HPA)* A *over a finite alphabet* <sup>Σ</sup> *is a tuple of* <sup>n</sup> *modules* (Si,Pri, λi, s<sup>0</sup> i , s<sup>f</sup> i )<sup>1</sup>≤i≤<sup>n</sup> *where for all* <sup>i</sup>*,*


Intuitively, the system starts in module 1, in state s<sup>0</sup> <sup>1</sup>. Each time a state <sup>s</sup> <sup>∈</sup> <sup>S</sup>i associated with a module j>i, that is <sup>λ</sup>i(s) = <sup>j</sup>, is entered by a

**Fig. 2.** An HPA with an exponential number of states.

**Fig. 3.** An HPA without redundancy

transition <sup>t</sup> <sup>→</sup> <sup>s</sup>, the system goes to state <sup>s</sup><sup>0</sup> j and stays in <sup>S</sup><sup>j</sup> till <sup>s</sup> f j is seen, in which case it comes back to state s and takes a transition s → t (according to the probability distribution from s). This process can be repeated from any state in a module i to any module j as long as j>i.

To define the semantics of (Si,Pri, λi, s<sup>0</sup> i , s<sup>f</sup> i )<sup>1</sup>≤i≤<sup>n</sup> formally, we inductively replace states associated with the deepest module by their definition. Indeed, nodes from the deepest module are not associated with any module by definition. Once every module has been replaced, a (flat) probabilistic automaton is obtained with the intended semantics.

Hence, HPA have the same expressive power as probabilistic automata. Yet, they may be much more compact: we denote by |A| the size of the description of the hierarchical automaton and by A the size of the unfolded automaton. The interest of such a description is that it may be exponentially smaller than the size of the unfolded automaton, as depicted in Fig. 2: here, every module contains two copies of the next module, with the exception of the last one. While the number of states in the description is linear (4n), the number of states in the unfolded automaton is equal to 3 · 2<sup>n</sup> − 2.

The symbolic Algorithm 1 is naturally modular, in that computations on a module used several times can be performed only once by considering states of the deepest module first. Indeed, one module can be summarized by three information items: the probability (and moments) to answer the query in this module, the probability (and moments) to leave this module without answering the query in the module and the probability to stay forever in this module without answering the query. Then the information can be used for shallower modules: every time a state s in a module i is associated with the deepest module, it can be replaced by this small set of states containing all the relevant information about the deepest module (and computed only once). Then, this process can be repeated to eliminate modules recursively. This leads to a complexity in the small size |A| of the compact HPA representation rather than in the large size ||A|| of the unfolded PA:

**Theorem 3.** *Let* A *be an HPA with* k *modules of size at most* m*. The* n *first moments of the distribution associated with* A *can be computed in time* O(n<sup>2</sup>km<sup>3</sup>)*.*

Not only does Theorem 3 reduces the complexity for hierarchical representations with redundancy (O(n<sup>2</sup>k) for the example in Fig. 2 instead of O(n<sup>2</sup>2<sup>3</sup>k) when running the algorithm in [13] on the equivalent flat PA), it also gives a better complexity on structure without redundancy. Consider the example in Fig. 3, without redundancy, with an unfolded PA with 3k + 1 states. Theorem 3 takes time O(n<sup>2</sup>k3<sup>3</sup>), while the algorithm in [13] on the equivalent flat PA would take time O(n<sup>2</sup>(3k)<sup>3</sup>).

#### **4.2 Parametric Systems**

Another case where having a symbolic algorithm is helpful is when the system has parameters standing for probability values (see for instance Fig. 4, where p is such a parameter). We illustrate two cases here.

The first case is when parameters help with redundancy. Often, stochastic systems reuse the same constructions, but with different probability values. This would be naturally encoded as a module M of a hierarchical system using a set of parameters P. This module M would be used several times, with different values of parameters specified in each module using it.

In this case, one can run Algorithm 1 on M, using the parameter values literally in the equations. This yields rational functions <sup>f</sup>n : [0, 1]<sup>P</sup> <sup>→</sup> (0, 1] of the parameters expressing the moments of order n for module M, for all n. For instance with the example of Fig. 4, the probability to reach state 4 from state 1 is equal to <sup>2</sup>p+4 <sup>5</sup>p+4 , and the mean time is equal to 112+44p−12p<sup>2</sup> (5p+4)(2p+4) . Each time module <sup>M</sup> is used, <sup>f</sup>n can be evaluated using the value of the parameters <sup>P</sup> for this particular usage.

Another possible use of parameters is to model uncertainty of values. In the example of Fig. 4, we may not know exactly the value of parameter p, but only know that it is above 0.8. In this case, one may be interested of synthesizing the largest (resp. smallest) moment of order n which is smaller (resp. larger) than the moment of any system realizing the parametric system, that is where p is replaced by any value above 0.8. This will be particularly interesting in the next section discussing bounds. To do so, one can use the rational function <sup>f</sup>n to compute its minimal and maximal values (e.g. deriving it and looking for 0 with Euler's method). In this way, we also obtain the best/worst value for p.

**Fig. 4.** Example of a parametric system with set of parameters {p}

#### **4.3 Uniqueness of the Distribution**

Last, we use the symbolic expression of moments obtained in Sect. 3 in order to prove the uniqueness of the distribution having moments of first passage times of (continuous-time) Markov chains. Thus this distribution is the distribution of response times of the system considered.

Notice that in general, there may be several distributions that correspond to a given sequence of moments (μn)n∈N. This would compromise approximating the distribution using moments, as there would not be a unique such distribution.

*Example 3.* Let us consider a distribution δ on R<sup>+</sup>. If δ has the sequence of moments {μn <sup>=</sup> <sup>n</sup>! <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}, then <sup>δ</sup> is the exponential distribution with parameter 1. Similarly, the sequence of moments {μn = (2n)! <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} for a distribution on R<sup>+</sup> is characteristic of the square of the exponential distribution of parameter 1.

Now, consider the cube of the exponential distribution of parameter 1. Its sequence of moments is {μn = (3n!) <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}. However, there exist an infinite number of distributions with this sequence of moments [18].

We now prove answer positively to the Stieljes moment problem for the case of the distribution of response time in a (continuous-time) Markov chain, that is its sequence of moments respects the Carleman's condition from year 1922, that guarantees the uniqueness of the distribution. The condition is that - n∈<sup>N</sup> <sup>μ</sup>n(δ)<sup>−</sup> <sup>1</sup> <sup>2</sup>*<sup>n</sup>* = ∞.

**Theorem 4.** *Let* <sup>A</sup> *be a probabilistic automaton or a CTMC. For all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, let* <sup>μ</sup>n *be the moment of order* <sup>n</sup> *of the times of first passage in a set of state* <sup>R</sup> *of* <sup>A</sup>*. Then there exists a unique distribution* <sup>δ</sup> *such that* <sup>μ</sup>n(δ) = <sup>μ</sup>n *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

**Sketch of Proof:** We first consider CTMC where all states have the same sojourn time λ. Then, a path that uses i transitions to answer a query will follow the gamma distribution with parameters (i, λ). We have a symbolic expression for moments of this distribution thanks to Sect. - 3. This can be used to minimize ∞ n=0 <sup>μ</sup>n(δ)<sup>−</sup> <sup>1</sup> <sup>2</sup>*<sup>n</sup>* by a diverging sum.

For general CTMCs, we use the fact that <sup>E</sup>(Γ(i, λ1)n) <sup>≤</sup> <sup>E</sup>((E(λ1) + ··· <sup>+</sup> <sup>E</sup>(λi))n) iff <sup>λ</sup><sup>1</sup> <sup>=</sup> min(λj )<sup>i</sup> j=1. It allows us to minimize the Carleman's sum of the CTMC considered by the Carleman's sum of the CTMC where all sojourn times are replaced by the smallest sojourn time λ, hence the divergence.

The case of probabilistic automaton is simpler.

We show how this theorem allows to approximate distribution δ in the next subsection.

#### **4.4 A Sequence of Distributions Converging Towards** *δ*

Since we have unicity of the distribution corresponding to the sequence of moments of the distribution of response time of a probabilistic automaton, we obtain the following convergence in law:

**Proposition 5 (**[17]**).** *Let* δ *be the distribution of response times of a probabilistic automaton. Let* (δi)i∈<sup>N</sup> *be a sequence of distributions on* <sup>R</sup><sup>+</sup> *such that for all* n*,* lim i→∞μn(δi) = <sup>μ</sup>n(δ)*. Then, if* <sup>C</sup><sup>i</sup> *is the cumulative distribution function of* <sup>δ</sup><sup>i</sup> *and* C *the cumulative distribution function of* δ*, then for all* x lim i→∞Ci(x) = <sup>C</sup>(x)*.*

Thus, <sup>C</sup> can be approximated by taking a sequence (δn)n∈<sup>N</sup> of distribution such that for all <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>μ</sup>i(δn) = <sup>μ</sup>i(δ). A reasonable choice for <sup>δ</sup>n is to consider the distribution of maximal entropy corresponding to the moments <sup>μ</sup>1,...,μn, as presented in [11]. The distribution of maximal entropy can be understood as the distribution that assume the least information. It can be approximated as close as desired, for instance <sup>1</sup> n close to the distribution of maximal entropy having moments (μ1(δ),...,μn(δ)). Applying Proposition 5, we thus obtain that the cumulative distribution function associated with <sup>δ</sup>i converges towards the cumulative distribution function associated with δ.

### **5 Bounding the Response Time**

We now explain how to use moments in order to obtain optimal bounds on the response time. First, notice that as soon as there exists a loop between a query and a response (as in Fig. 1), then there will be runs with arbitrarily long response times, although there might be probability 1 to eventually answer every query (which is the case for Fig. 1). We thus turn to a more quantitative evaluation of the response time.

Let 0 <p< 1. We are interested in a bound T on the delay between a query and a response such that more than 1−p of the queries are answered before this bound. For a distribution <sup>δ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> of response times, we denote by <sup>B</sup>(δ, p) the lowest T such that the probability to have a response time above T is lower than p. Equivalently, we look for the highest T such that the probability of a response time above T is at least p.

We place ourselves in the general setting of continuous distributions, where Dirac delta functions are allowed for simplicity. Discrete distributions form a special case, with delta functions at integer values. One could get rid of Dirac delta functions by -approximating them without changing the moments, obtaining the same bounds as we prove here.

#### **5.1 Tchebychev Bounds Associated with One Moment**

Let <sup>i</sup> <sup>∈</sup> <sup>N</sup> and <sup>μ</sup><sup>i</sup> <sup>&</sup>gt; 0. We let <sup>Δ</sup>i,μ*<sup>i</sup>* be the set of distributions of response time which have <sup>μ</sup>i as moment of order <sup>i</sup>. We are interested in bounding <sup>B</sup>(δ, p) for all <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>i,μ*<sup>i</sup>* , that is for all distributions with <sup>μ</sup><sup>i</sup> as moment of order <sup>i</sup>. Such a bound is provided by *Tchebychev inequality*, and it is optimal:

**Proposition 6.** *Let* <sup>i</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>μ</sup>i*. Let* <sup>α</sup>i(μi, p) = *<sup>i</sup>* <sup>μ</sup>*<sup>i</sup>* p . *Then for all* <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>i,μ*<sup>i</sup> , we have* <sup>B</sup>(δ, p) <sup>≤</sup> <sup>α</sup>i(μi, p)*. Further,* <sup>∃</sup><sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>i,μ*<sup>i</sup> such that* <sup>B</sup>(δ, p) = <sup>α</sup>i(μi, p)*.*

*Proof.* It suffices to remark that <sup>μ</sup>i > pb<sup>i</sup> for <sup>b</sup> the bound we want to reach. Further, this bound is trivially optimal: it suffices to consider a distribution with a Dirac of mass (1 <sup>−</sup> <sup>p</sup>) at 0 and a Dirac of mass <sup>p</sup> at <sup>α</sup>i(μi, p).

Given a probabilistic automaton, let δ be its associated distribution of response time. We can compute its associated moments <sup>μ</sup>i using Algorithm 1, described in the previous section. We thus know that <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>i,μ*<sup>i</sup>* . Given different values of i, one can compute the different moments and apply for each of the Tchebychev bound and use the minimal bound obtained.

Understanding the relationship between the <sup>α</sup>i is thus important. For i<j, one can use Jensen's inequality for the convex function <sup>f</sup> : <sup>x</sup> <sup>→</sup> <sup>x</sup> *<sup>j</sup> <sup>i</sup>* over R<sup>+</sup>, and obtain: (μi)<sup>j</sup> <sup>≤</sup> (μj )<sup>i</sup> . For instance, μ<sup>2</sup> <sup>1</sup> < μ2.

For <sup>p</sup> = 1, this gives <sup>α</sup>i(<sup>p</sup> = 1) < αj (<sup>p</sup> = 1). On the other hand, for <sup>p</sup> sufficiently close to 0, we have <sup>α</sup>j (p) < αi(p). That is, when <sup>p</sup> is very small, moments of high orders will give better bounds than moments of lower order. On the other hand, if p is not that small, moments of small order will suffice.

#### **5.2 Optimal Bounds for a Pair of Moments**

We now explain how to extend Tchebychev bounds to pairs of moments: We consider the set of distributions where two moments are fixed. Let i<j be two orders of moments and <sup>μ</sup>i, μj <sup>&</sup>gt; 0. We denote by <sup>Δ</sup>j,μ*<sup>j</sup>* i,μ*<sup>i</sup>* the set of distributions with <sup>μ</sup>i, μj as moments of order i, j respectively. As <sup>Δ</sup>j,μ*<sup>j</sup>* i,μ*<sup>i</sup>* is strictly included into <sup>Δ</sup>i,μ*<sup>i</sup>* and in <sup>Δ</sup>j,μ*<sup>j</sup>* , min(αi(p), α<sup>j</sup> (p)) is a bound for any <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>j,μ*<sup>j</sup>* i,μ*<sup>i</sup>* . However, it may be the case that min(αi(p), αj (p)) is not optimal. We now provide *optimal* bounds α<sup>j</sup> i (p) for any pair i<j of order of moments and probability <sup>p</sup>:

**Theorem 5.** *Let* i<j *be natural integers,* <sup>p</sup> <sup>∈</sup> (0, 1)*, and let* <sup>μ</sup>i, μj <sup>&</sup>gt; <sup>0</sup>*. Let* <sup>α</sup>i = (μ*<sup>i</sup>* p ) 1 *<sup>i</sup> and* <sup>α</sup>j = (μ*<sup>j</sup>* p ) 1 *<sup>j</sup> . We define* α<sup>j</sup> i (p) *to be:*

*–* <sup>α</sup>i *if* <sup>α</sup>i <sup>≤</sup> <sup>α</sup>j *, –* (μ*j*−<sup>M</sup> p ) 1 *<sup>j</sup> otherwise, where* <sup>0</sup> <sup>≤</sup> <sup>M</sup> <sup>≤</sup> <sup>μ</sup>j *is the smallest positive real root of:*

$$\mu\_i = (1-p)^{\frac{j-i}{j}} M^{\frac{i}{j}} + p^{\frac{j-i}{j}} (\mu\_j - M)^{\frac{i}{j}}.$$

*For all* δ ∈ Δj,μ*<sup>j</sup>* i,μ*<sup>i</sup> , we have* <sup>B</sup>(δ, p) <sup>≤</sup> <sup>α</sup><sup>j</sup> i *, and* <sup>∃</sup><sup>δ</sup> <sup>∈</sup> <sup>Δ</sup>j,μ*<sup>j</sup>* i,μ*<sup>i</sup> with* <sup>B</sup>(δ, p) = <sup>α</sup><sup>j</sup> i

To obtain a value for M, one can use for instance Newton's method. For i = 1, j = 2, we can compute explicitly M and obtain:

$$
\alpha\_1^2 = \mu\_1 + \sqrt{\frac{(1-p)}{p}(\mu\_2 - \mu\_1^2)}.
$$

*Example 4.* Consider the distribution associated with the system of Fig. 1. We obtain the following bounds <sup>α</sup>i(p), αi−<sup>1</sup> i (p) considering different values of p and i:


For p = 0.1, it is not useful to consider moments of order higher than 3. For p = 0.01, moment of order 5 provides better bounds than moment of lower orders.

For hierarchical systems, one can compute moments in an efficient way using Theorem 3, and then use Theorem 5 to obtain the associated optimal bounds. In order to handle parametric systems, we use the following result which allows to underapproximate the value of M, and thus overapproximate the optimal bound, by iterating the following operator f from x = 0:

$$f: x \mapsto \frac{(\mu\_i - [\mu\_j - x]^{\frac{i}{j}} p^{\frac{j-i}{j}})^{\frac{i}{i}}}{(1 - p)^{\frac{j-i}{i}}}$$

**Lemma 1.** (<sup>f</sup> <sup>n</sup>(0))n∈<sup>N</sup> *is strictly increasing and converges towards* <sup>M</sup>*.*

We show how to -approximate the *optimal* bound B of a *parametric* probabilistic automaton A with set of parameters P, that is such that for all val ∈ V <sup>P</sup> , the probabilistic automaton A with valuation val for parameter values has a bound b(val) ≤ B and there exists a val ∈ V <sup>P</sup> such that b(val) = B. First, we obtain the moments as symbolic functions of the parameters using Sect. 4.2. Then, we compute M<sup>1</sup> = f(0) as a function of the parameters, using Lemma 1 and replacing <sup>μ</sup>i, μj by their expression. One can then compute the minimum m<sup>1</sup> of function M<sup>1</sup> over all the parameters. We then proceed with M<sup>2</sup> = f(m1), and so on till obtaining a value m. This allows to obtain a lower bound m over values of <sup>M</sup> for all parameter values. Computing the largest <sup>μ</sup>j over all parameters allows to obtain an upper bound <sup>B</sup>up: <sup>B</sup> <sup>≤</sup> <sup>B</sup>up = (μ*j*−<sup>m</sup> p ) 1 *<sup>j</sup>* . A lower bound <sup>B</sup>lw is easily obtained by considering the value <sup>≥</sup> <sup>m</sup> of <sup>M</sup> for the parameters maximizing <sup>μ</sup>j . If the distance between <sup>B</sup>up and <sup>B</sup>lw is larger than , one can partition the space of parameter values in zones and proceed in the same way on each zone, forgetting zones for which <sup>B</sup>up is lower than the <sup>B</sup>lw of another zone, till the distance between max(Blw) and max(Bup) over zones is smaller than .

### **6 Conclusion**

In this paper, we have shown how to compute moments symbolically for probabilistic automata and CTMCs, using adequately defined semirings. This method has the same complexity as the efficient numerical methods already known [13]. The proof of this symbolic computation allows proving that there is a unique distribution of response time corresponding to a probabilistic automaton or a CTMC. This allows obtaining simple approximated distributions scheme converging in law towards the distribution of response time. The symbolic computation of moments also allows computing moments in compact (hierarchical) models faster, as well as finding lowest/highest value of moments in parametric systems.

We also provide optimal bounds on the delay after which very few queries stay unanswered. It is optimal when considering distribution displaying a given pair of moments, and we showed on a simple example how this improves Tchebychev bounds. This can be used efficiently to obtain bounds for compact (hierarchical) models or to compute an optimal bound which fulfills the response of almost all queries even for systems where some parameter values are not known exactly.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Comparator Automata in Quantitative Verification**

Suguman Bansal(B), Swarat Chaudhuri(B) , and Moshe Y. Vardi(B)

> Rice University, Houston, TX 77005, USA {suguman,swarat}@rice.edu, vardi@cs.rice.edu

**Abstract.** The notion of comparison between system runs is fundamental in formal verification. This concept is implicitly present in the verification of qualitative systems, and is more pronounced in the verification of quantitative systems. In this work, we identify a novel mode of comparison in quantitative systems: the online comparison of the aggregate values of two sequences of quantitative weights. This notion is embodied by *comparator automata* (*comparators*, in short), a new class of automata that read two infinite sequences of weights synchronously and relate their aggregate values.

We show that comparators that are finite-state and accept by the B¨uchi condition lead to generic algorithms for a number of well-studied problems, including the quantitative inclusion and winning strategies in quantitative graph games with incomplete information, as well as related non-decision problems, such as obtaining a finite representation of all counterexamples in the quantitative inclusion problem.

We study comparators for two aggregate functions: discounted-sum and limit-average. We prove that the discounted-sum comparator is ωregular for all integral discount factors. Not every aggregate function, however, has an ω-regular comparator. Specifically, we show that the language of sequence-pairs for which limit-average aggregates exist is neither ω-regular nor ω-context-free. Given this result, we introduce the notion of *prefix-average* as a relaxation of limit-average aggregation, and show that it admits ω-context-free comparators.

### **1 Introduction**

Many classic questions in formal methods can be seen as involving *comparisons* between different system runs or inputs. Consider the problem of verifying if a system S satisfies a linear-time temporal property P. Traditionally, this problem is phrased language-theoretically: S and P are interpreted as sets of (infinite) words, and <sup>S</sup> is determined to satisfy <sup>P</sup> if <sup>S</sup> <sup>⊆</sup> <sup>P</sup>. The problem, however, can also be framed in terms of a *comparison* between words in S and P. Suppose a word w is assigned a weight of 1 if it belongs to the language of the system

**Electronic supplementary material** The online version of this chapter (https:// doi.org/10.1007/978-3-319-89366-2 23) contains supplementary material, which is available to authorized users.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 420–437, 2018. https://doi.org/10.1007/978-3-319-89366-2\_23

or property, and 0 otherwise. Then determining if <sup>S</sup> <sup>⊆</sup> <sup>P</sup> amounts to checking whether the weight of every word in S is less than or equal to its weight in P [5].

The need for such a formulation is clearer in quantitative systems, in which every run of a word is associated with a sequence of (rational-valued) weights. The weight of a run is given by *aggregate function* <sup>f</sup> : <sup>Q</sup><sup>ω</sup> <sup>→</sup> <sup>R</sup>, which returns the real-valued *aggregate value* of the run's weight sequence. The weight of a word is given by the supremum or infimum of the weight of all its runs. Common examples of aggregate functions include discounted-sum and limit-average.

In a well-studied class of problems involving quantitative systems, the objective is to check if the aggregate value of words of a system exceed a constant threshold value [14–16]. This is a natural generalization of emptiness problems in qualitative systems. Known solutions to the problem involve arithmetic reasoning via linear programming and graph algorithms such as negative-weight cycle detection, computation of maximum weight of cycles etc. [4,18].

A more general notion of comparison relates aggregate values of two weight sequences. Such a notion arises in the *quantitative inclusion problem* for weighted automata [1], where the goal is to determine whether the weight of words in one weighted automaton is less than that in another. Here it is necessary to compare the aggregate value along runs between the two automata. Approaches based on arithmetic reasoning do not, however, generalize to solving such problems. In fact, the known solution to discounted-sum inclusion with integer discountfactor combines linear programming with a *specialized* subset-construction-based determinization step, rendering an EXPTIME algorithm [4,6]. Yet, this approach does not match the PSPACE lower bound for discounted-sum inclusion.

In this paper, we present an automata-theoretic formulation of this form of comparison between weighted sequences. Specifically, we introduce *comparator automata* (*comparators*, in short), a class of automata that read pairs of infinite weight sequences synchronously, and compare their aggregate values in an online manner. While comparisons between weight sequences happen implicitly in prior approaches to quantitative systems, comparator automata make these comparisons explicit. We show that this has many benefits, including generic algorithms for a large class of quantitative reasoning problems, as well as a direct solution to the problem of discounted-sum inclusion that also closes its complexity gap.

A *comparator for aggregate function* f is an automaton that accepts a pair (A, B) of sequences of bounded rational numbers iff f(A) R f(B), where R is an inequality relation (>, <sup>&</sup>lt;, <sup>≥</sup>, <sup>≤</sup>) or the equality relation. A comparator could be finite-state or (pushdown) infinite-state. This paper studies such comparators.

A comparator is ω-*regular* if it is finite-state and accepts by the B¨uchi condition. We show that ω-regular comparators lead to generic algorithms for a number of well-studied problems including the quantitative inclusion problem, and in showing existence of winning strategies in incomplete-information quantitative games. Our algorithm yields PSPACE-completeness of quantitative inclusion when the ω-regular comparator is provided. The same algorithm extends to obtaining finite-state representations of counterexample words in inclusion.

Next, we show that the discounted-sum aggregation function admits an ωregular comparator when the discount-factor d > 1 is an integer. Using properties of ω-regular comparators, we conclude that the discounted-sum inclusion is PSPACE-complete, hence resolving the complexity gap. Furthermore, we prove that the discounted-sum comparator for 1 <d< 2 cannot be ω-regular. We suspect this result extends to non-integer discount-factors as well.

Finally, we investigate the limit-average comparator. Since limit-average is only defined for sequences in which the average of prefixes converge, limit-average comparison is not well-defined. We show that even a B¨uchi pushdown automaton cannot separate sequences for which limit-average exists from those for which it does not. Hence, we introduce the novel notion of *prefix-average comparison* as a relaxation of limit-average comparison. We show that the prefix-average comparator admits a comparator that is ω-context-free, i.e., given by a B¨uchi pushdown automaton, and we discuss the utility of this characterization.

This paper is organized as follows: Preliminaries are given in Sect. 2. Comparator automata is formally defined in Sect. 3. Generic algorithms for ω-regular comparators are discussed in Sects. 3.1 and 3.2. The construction and properties of discounted-sum comparator, and limit-average and prefix-average comparator are given in Sects. 4 and 5, respectively. We conclude with future directions in Sect. 6.

**Related Work.** The notion of comparison has been widely studied in quantitative settings. Here we mention only a few of them. Such aggregate-function based notions appear in weighted automata [1,17], quantitative games including mean-payoff and energy games [16], discounted-payoff games [3,4], in systems regulating cost, memory consumption, power consumption, verification of quantitative temporal properties [14,15], and others. Common solution approaches include graph algorithms such as weight of cycles or presence of cycle [18], linearprogramming-based approaches, fixed-point-based approaches [8], and the like. The choice of approach for a problem typically depends on the underlying aggregate function. In contrast, in this work we present an automata-theoretic approach that unifies solution approaches to problems on different aggregate functions. We identify a class of aggregate functions, ones that have an ω-regular comparator, and present generic algorithms for some of these problems.

While work on finite-representations of counterexamples and witnesses in the qualitative setting is known [5], we are not aware of such work in the quantitative verification domain. This work can be interpreted as automata-theoretic arithmetic, which has been explored in regular real analysis [12].

### **2 Preliminaries**

**Definition 1 (B¨uchi automata** [21]**).** *A (finite-state)* B¨uchi automaton *is a tuple* <sup>A</sup> = (*S*, Σ, δ,*Init*, <sup>F</sup>)*, where S is a finite set of* states*,* <sup>Σ</sup> *is a finite* input alphabet*,* <sup>δ</sup> <sup>⊆</sup> (*<sup>S</sup>* <sup>×</sup><sup>Σ</sup> <sup>×</sup>*S*) *is the* transition relation*, Init* <sup>⊆</sup> *S is the set of* initial states*, and* F ⊆ *S is the set of* accepting states*.*

A B¨uchi automaton is *deterministic* if for all states <sup>s</sup> and inputs <sup>a</sup>, |{s <sup>|</sup>(s, a, s ) ∈ δ for some s }| ≤ 1 and |*Init*| = 1. Otherwise, it is *nondeterministic*. For a word <sup>w</sup> <sup>=</sup> <sup>w</sup>0w<sup>1</sup> ··· ∈ <sup>Σ</sup>ω, a *run* <sup>ρ</sup> of <sup>w</sup> is a sequence of states <sup>s</sup>0s<sup>1</sup> ... s.t. <sup>s</sup><sup>0</sup> <sup>∈</sup> *Init*, and <sup>τ</sup><sup>i</sup> = (si, wi, si+1) <sup>∈</sup> <sup>δ</sup> for all <sup>i</sup>. Let *inf* (ρ) denote the set of states that occur infinitely often in run <sup>ρ</sup>. A run <sup>ρ</sup> is an *accepting run* if *inf* (ρ) ∩ F <sup>=</sup> <sup>∅</sup>. A word w is an accepting word if it has an accepting run. B¨uchi automata are known to be closed under set-theoretic union, intersection, and complementation [21]. Languages accepted by these automata are called ω-*regular languages*.

**Definition 2 (Weighted** ω*-***automaton** [10,20]**).** *A* weighted ω*-*automaton *over infinite words is a tuple* <sup>A</sup> = (M, γ)*, where* <sup>M</sup> = (*S*, Σ, δ,*Init*, *<sup>S</sup>*) *is a B¨uchi automaton, and* <sup>γ</sup> : <sup>δ</sup> <sup>→</sup> <sup>Q</sup> *is a* weight function*.*

*Words* and *runs* in weighted ω-automata are defined as they are in B¨uchi automata. Note that all states are accepting states in this definition. The *weight sequence* of run ρ = s0s<sup>1</sup> ... of word w = w0w<sup>1</sup> ... is given by wt<sup>ρ</sup> = n0n1n<sup>2</sup> ... where n<sup>i</sup> = γ(si, wi, s<sup>i</sup>+1) for all i. The *weight of a run* ρ is given by f(wtρ), where <sup>f</sup> : <sup>Q</sup><sup>ω</sup> <sup>→</sup> <sup>R</sup> is an *aggregate function*. We use <sup>f</sup>(ρ) to denote <sup>f</sup>(wtρ).

Here the *weight of a word* <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> in weighted <sup>ω</sup>-automata is defined as wtA(w) = sup{f(ρ)|<sup>ρ</sup> is a run of <sup>w</sup> in A}. It can also be defined as the infimum of the weight of all its runs. By convention, if word w /∈ A, wtA(w)=0[10].

**Definition 3 (Quantitative inclusion).** *Given two weighted* ω*-automata* P *and* Q *with aggregate function* f*, the* quantitative-inclusion problem*, denoted by* <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup>*, asks whether for all words* <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>*,* wt<sup>P</sup> (w) <sup>≤</sup> wtQ(w)*.*

Quantitative inclusion is PSPACE-complete for limsup and liminf [10], and undecidable for limit-average [16]. For discounted-sum with integer discount-factor it is in EXPTIME [6,10], and decidability is unknown for rational discount-factors

**Definition 4 (Incomplete-information quantitative games).** *An* incomplete-information quantitative game *is a tuple* <sup>G</sup> = (S, sI, *<sup>O</sup>*, Σ, δ, γ, f)*, where* <sup>S</sup>*, O,* <sup>Σ</sup> *are sets of* states*,* observations*, and* actions*, respectively,* <sup>s</sup><sup>I</sup> <sup>∈</sup> <sup>S</sup> *is the* initial state*,* <sup>δ</sup> <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>S</sup> *is the* transition relation*,* <sup>γ</sup> : <sup>S</sup> <sup>→</sup> <sup>N</sup> <sup>×</sup> <sup>N</sup> *is the* weight function*, and* <sup>f</sup> : <sup>N</sup><sup>ω</sup> <sup>→</sup> <sup>R</sup> *is the* aggregate function*.*

The transition relation δ is *complete*, i.e., for all states p and actions a, there exists a state <sup>q</sup> s.t. (p, a, q) <sup>∈</sup> <sup>δ</sup>. A *play* <sup>ρ</sup> is a sequence <sup>s</sup>0a0s1a<sup>1</sup> ... , where <sup>τ</sup><sup>i</sup> = (si, ai, s<sup>i</sup>+1) <sup>∈</sup> <sup>δ</sup>. The *observation of state* <sup>s</sup> is denoted by *<sup>O</sup>*(s) <sup>∈</sup> *<sup>O</sup>*. The *observed play* o<sup>ρ</sup> of ρ is the sequence o0a0o1aa<sup>1</sup> ... , where o<sup>i</sup> = *O*(si). Player P<sup>0</sup> has incomplete information about the game G; it only perceives the observation play oρ. Player P<sup>1</sup> receives full information and witnesses play ρ. Plays begin in the initial state <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>s</sup>I. For <sup>i</sup> <sup>≥</sup> 0, Player <sup>P</sup><sup>0</sup> selects action <sup>a</sup>i. Next, player <sup>P</sup><sup>1</sup> selects the state <sup>s</sup><sup>i</sup>+1, such that (si, ai, s<sup>i</sup>+1) <sup>∈</sup> <sup>δ</sup>. The *weight of state* <sup>s</sup> is the pair of payoffs γ(s)=(γ(s)0, γ(s)1). The *weight sequence* wt<sup>i</sup> of player P<sup>i</sup> along ρ is given by γ(s0)iγ(s1)<sup>i</sup> ... , and its payoff from ρ is given by f(wti) for aggregate function f, denoted by f(ρi), for simplicity. A play on which a player receives a greater payoff is said to be a *winning play* for the player. A strategy for player <sup>P</sup><sup>0</sup> is given by a function <sup>α</sup> : *<sup>O</sup>*<sup>∗</sup> <sup>→</sup> <sup>Σ</sup> since it only sees observations. Player P<sup>0</sup> follows strategy α if for all i, a<sup>i</sup> = α(o<sup>0</sup> ...oi). A strategy α is said to be a *winning strategy* for player P<sup>0</sup> if all plays following α are winning plays for P0.

**Definition 5 (B¨uchi pushdown automata** [13]**).** *A* B¨uchi pushdown automaton (B¨uchi PDA) *is a tuple* <sup>A</sup> = (*S*, Σ, Γ, δ,*Init*, Z0, <sup>F</sup>)*, where S ,* <sup>Σ</sup>*,* <sup>Γ</sup>*, and* F *are finite sets of* states*,* input alphabet*,* pushdown alphabet *and* accepting states*, respectively.* <sup>δ</sup> <sup>⊆</sup> (*<sup>S</sup>* <sup>×</sup><sup>Γ</sup> <sup>×</sup>(<sup>Σ</sup> ∪ { })×*<sup>S</sup>* <sup>×</sup>Γ) *is the* transition relation*, Init* <sup>⊆</sup> *S is a set of* initial states*,* <sup>Z</sup><sup>0</sup> <sup>∈</sup> <sup>Γ</sup> *is the* start symbol*.*

<sup>A</sup> *run* <sup>ρ</sup> on a word <sup>w</sup> <sup>=</sup> <sup>w</sup>0w<sup>1</sup> ··· ∈ <sup>Σ</sup><sup>ω</sup> of a B¨uchi PDA <sup>A</sup> is a sequence of configurations (s0, γ0),(s1, γ1)... satisfying (1) <sup>s</sup><sup>0</sup> <sup>∈</sup> *Init*, <sup>γ</sup><sup>0</sup> <sup>=</sup> <sup>Z</sup>0, and (2) (si, γi, wi, s<sup>i</sup>+1, γ<sup>i</sup>+1) <sup>∈</sup> <sup>δ</sup> for all <sup>i</sup>. B¨uchi PDA consists of a *stack*, elements of which are the tokens Γ, and initial element Z0. Transitions *push* or *pop* token(s) to/from the top of the stack. Let *inf* (ρ) be the set of states that occur infinitely often in state sequence s0s<sup>1</sup> ... of run ρ. A run ρ is an *accepting run* in B¨uchi PDA if *inf* (ρ)∩ F <sup>=</sup> <sup>∅</sup>. A word <sup>w</sup> is an *accepting word* if it has an accepting run. Languages accepted by B¨uchi PDA are called ω-*context-free languages* (ω-CFL).

We introduce some notation. For an infinite sequence A = (a0, a1,...), A[i] denotes its <sup>i</sup>-th element. Abusing notation, we write <sup>w</sup> ∈ A and <sup>ρ</sup> ∈ A if <sup>w</sup> and <sup>ρ</sup> are an accepting word and an accepting run of <sup>A</sup> respectively.

For missing proofs and constructions, refer to the supplementary material.

### **3 Comparator Automata**

*Comparator automata* (often abbreviated as *comparators*) are a class of automata that can read pairs of weight sequences synchronously and establish an equality or inequality relationship between these sequences. Formally, we define:

**Definition 6 (Comparator automata).** *Let* Σ *be a finite set of rational numbers, and* <sup>f</sup> : <sup>Q</sup><sup>ω</sup> <sup>→</sup> <sup>R</sup> *denote an aggregate function. A* comparator automaton for aggregate function <sup>f</sup> *is an automaton over the alphabet* <sup>Σ</sup> <sup>×</sup> <sup>Σ</sup> *that accepts a pair* (A, B) *of (infinite) weight sequences iff* f(A) R f(B)*, where* R *is an inequality or the equality relation.*

From now on, unless mentioned otherwise, we assume that all weight sequences are bounded, natural number sequences. The boundedness assumption is justified since the set of weights forming the alphabet of a comparator is bounded. For all aggregate functions considered in this paper, the result of comparison of weight sequences is preserved by a uniform linear transformation that converts rationalvalued weights into natural numbers; justifying the natural number assumption.

We explain comparators through an example. The *limit supremum* (limsup, in short) of a bounded, integer sequence A, denoted by LimSup(A), is the largest integer that appears infinitely often in A. The *limsup comparator* is a B¨uchi automaton that

**Fig. 1.** State f*<sup>k</sup>* is an accepting state. Automaton A*<sup>k</sup>* accepts (A, B) iff LimSup(A) = k, LimSup(B) ≤ k. ∗ denotes {0, 1 ...μ}, ≤ m denotes {0, 1 ...,m}

accepts the pair (A, B) of sequences iff LimSup(A) <sup>≥</sup> LimSup(B).

The working of the limsup comparator is based on non-deterministically guessing the limsup of sequences <sup>A</sup> and <sup>B</sup>, and then verifying that LimSup(A) <sup>≥</sup> LimSup(B). B¨uchi automaton <sup>A</sup><sup>k</sup> (Fig. 1) illustrates the basic building block of the limsup comparator. Automaton <sup>A</sup><sup>k</sup> accepts pair (A, B) of number sequences iff LimSup(A) = <sup>k</sup>, and LimSup(B) <sup>≤</sup> <sup>k</sup>, for integer <sup>k</sup>. To see why this is true, first note that all incoming edges to accepting state <sup>f</sup><sup>k</sup> occur on alphabet (k, <sup>≤</sup> <sup>k</sup>) while all transitions between states <sup>f</sup><sup>k</sup> and <sup>s</sup><sup>k</sup> occur on alphabet (<sup>≤</sup> k, <sup>≤</sup> <sup>k</sup>), where <sup>≤</sup> <sup>k</sup> denotes the set {0, <sup>1</sup>,...k}. So, the integer <sup>k</sup> must appear infinitely often in A and all elements occurring infinitely often in A and B are less than or equal to <sup>k</sup>. Together these ensure that LimSup(A) = <sup>k</sup>, and LimSup(B) <sup>≤</sup> <sup>k</sup>. The union of such automata <sup>A</sup><sup>k</sup> for <sup>k</sup> ∈ {0, <sup>1</sup>,...μ} for upper bound <sup>μ</sup>, results in the limsup comparator. The *limit infimum* (liminf, in short) of an integer sequence is the smallest integer that appears infinitely often in it; its comparator is similar.

When the comparator for an aggregate function is a B¨uchi automaton, we call it an ω-*regular comparator*. Likewise, when the comparator for an aggregate function is a B¨uchi pushdown automaton, we call it an ω-*context-free comparator*. As seen here, the limsup and liminf comparators are ω-regular. Later, we see that discounted-sum comparator and prefix-average comparator are ω-regular and ωcontext-free respectively (Sects. 4 and 5). We call an aggregate function ω-*regular* when it has an ω-regular comparator for at least one inequality relation. Due to closure properties of B¨uchi automata, comparators for all inequality and equality relations of an ω-regular aggregate function are also ω-regular.

**Fig. 2.** Weighted automaton P

**Motivating Example.** Let weighted ω-automata P and Q be as illustrated in Figs. 2 and 3. The word w = a(ab)<sup>ω</sup> has two runs ρ<sup>P</sup> <sup>1</sup> <sup>=</sup> <sup>q</sup>1(q2)<sup>ω</sup>, <sup>ρ</sup><sup>P</sup> <sup>2</sup> <sup>=</sup> <sup>q</sup>1(q3)<sup>ω</sup>

### **Algorithm 1.** InclusionReg(P, Q, <sup>A</sup><sup>f</sup> ), Is <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup>?

1: **Input:** Weighted automata P, Q, and ω-regular comparator A*<sup>f</sup>* (Inequality ≤) 2: **Output:** True if P ⊆*<sup>f</sup>* Q, False otherwise 3: <sup>P</sup><sup>ˆ</sup> <sup>←</sup> AugmentWtAndLabel(P) 4: <sup>Q</sup><sup>ˆ</sup> <sup>←</sup> AugmentWtAndLabel(Q) 5: <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> <sup>←</sup> MakeProduct(P , <sup>ˆ</sup> <sup>Q</sup>ˆ) 6: *DimProof* <sup>←</sup> Intersect(P<sup>ˆ</sup> <sup>×</sup> Q, <sup>ˆ</sup> <sup>A</sup>-) 7: *Dim* ← FirstProject(*DimProof* ) 8: **return** <sup>P</sup><sup>ˆ</sup> <sup>≡</sup> *Dim*

in P, and four runs ρ<sup>Q</sup> <sup>=</sup> <sup>q</sup>1(q2)<sup>ω</sup>, <sup>ρ</sup><sup>Q</sup> <sup>=</sup> <sup>q</sup>1(q3)<sup>ω</sup>, <sup>ρ</sup><sup>Q</sup> <sup>=</sup> <sup>q</sup>1q1(q2)<sup>ω</sup> <sup>ρ</sup><sup>Q</sup> <sup>=</sup> <sup>q</sup>1q1(q3)<sup>ω</sup> in Q. Their weight-sequences are wt<sup>P</sup> = 3,(0, 1)<sup>ω</sup>, wt<sup>P</sup> = 2,(2, 0)<sup>ω</sup> in <sup>P</sup>, and wt<sup>Q</sup> = (2, 1)<sup>ω</sup>, wt<sup>Q</sup> = (0, 2)<sup>ω</sup>, wt<sup>Q</sup> = 1, <sup>2</sup>,(2, 1)<sup>ω</sup>, wt<sup>Q</sup> = 1, <sup>0</sup>,(0, 2)<sup>ω</sup> in <sup>Q</sup>.

To determine if w has greater weight in P or in Q, compare aggregate value of weight-sequences of runs in P and Q. Take the comparator for aggregate function <sup>f</sup> that accepts a pair (A, B) of weight-sequence iff <sup>f</sup>(A) <sup>≤</sup> <sup>f</sup>(B). For wt<sup>P</sup> (w) <sup>≤</sup> wtQ(w), for every run <sup>ρ</sup><sup>P</sup> <sup>i</sup> in <sup>P</sup>, there exists a run <sup>ρ</sup><sup>Q</sup> <sup>j</sup> in Q s.t. (ρ<sup>P</sup> <sup>i</sup> , ρ<sup>Q</sup> <sup>j</sup> ) is accepted by the comparator. This forms the basis for quantitative inclusion.

#### **3.1 Quantitative Inclusion**

InclusionReg (Algorithm 1) is an algorithm for quantitative inclusion for ω-regular aggregate functions. For weighted ω-automata P, Q, and ω-regular comparator <sup>A</sup><sup>f</sup> , InclusionReg returns True iff <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup>. We assume <sup>P</sup> <sup>⊆</sup> <sup>Q</sup> (qualitative inclusion) to avoid trivial corner cases.

**Key Ideas.** <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup> holds if for every run <sup>ρ</sup><sup>P</sup> in <sup>P</sup> on word <sup>w</sup>, there exists a run <sup>ρ</sup><sup>Q</sup> in <sup>Q</sup> on the same word <sup>w</sup> such that <sup>f</sup>(ρ<sup>P</sup> ) <sup>≤</sup> <sup>f</sup>(ρQ). We refer to such runs of <sup>P</sup> by *diminished run*. Hence, <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup> iff all runs of <sup>P</sup> are diminished.

InclusionReg constructs B¨uchi automaton *Dim* that consists of exactly the diminished runs of P. It returns True iff *Dim* contains all runs of P. To obtain *Dim*, it constructs B¨uchi automaton *DimProof* that accepts word (ρ<sup>P</sup> , ρQ) iff ρ<sup>P</sup> and <sup>ρ</sup><sup>Q</sup> are runs of the same word in <sup>P</sup> and <sup>Q</sup> respectively, and <sup>f</sup>(ρ<sup>P</sup> ) <sup>≤</sup> <sup>f</sup>(ρQ). The <sup>ω</sup>-regular comparator for inequality <sup>≤</sup> for function <sup>f</sup> ensures <sup>f</sup>(ρ<sup>P</sup> ) <sup>≤</sup> <sup>f</sup>(ρQ). The projection of *DimProof* on runs of P results in *Dim*.

**Algorithm Details.** InclusionReg has three steps: (a). UniqueId (Lines 3–4): Enables unique identification of runs in P and Q through *labels*. (b). Compare (Lines 5–7): Compares weight of runs in P with weight of runs in Q, and constructs *Dim*. (c). DimEnsure (Line 8): Ensures if all runs of P are diminished.

1. UniqueId: AugmentWtAndLabel transforms weighted <sup>ω</sup>-automaton <sup>A</sup> into B¨uchi automaton <sup>A</sup><sup>ˆ</sup> by converting transition <sup>τ</sup> = (s, a, t) with weight <sup>γ</sup>(<sup>τ</sup> ) in <sup>A</sup> to transition ˆ<sup>τ</sup> = (s,(a, γ(<sup>τ</sup> ), l), t) in <sup>A</sup>ˆ, where <sup>l</sup> is a unique label assigned to transition <sup>τ</sup> . The word ˆ<sup>ρ</sup> = (a0, n0, l0)(a1, n1, l1)··· ∈ <sup>A</sup><sup>ˆ</sup> iff run <sup>ρ</sup> ∈ A on word a0a<sup>1</sup> ... with weight sequence n0n<sup>1</sup> ... . Labels ensure bijection between runs in <sup>A</sup> and words in <sup>A</sup>ˆ. Words of <sup>A</sup><sup>ˆ</sup> have a single run in <sup>A</sup>ˆ. Hence, transformation of weighted ω-automata P and Q to B¨uchi automata Pˆ and Qˆ enables disambiguation between runs of P and Q (Line 3–4).

2. Compare: The output of this step is the B¨uchi automaton *Dim*, that contains the word ˆ<sup>ρ</sup> <sup>∈</sup> <sup>P</sup><sup>ˆ</sup> iff <sup>ρ</sup> is a diminished run in <sup>P</sup> (Lines 5–7).

MakeProduct(P , <sup>ˆ</sup> <sup>Q</sup>ˆ) constructs <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> s.t. word ( ˆρ<sup>P</sup> , <sup>ρ</sup>ˆQ) <sup>∈</sup> <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup><sup>ˆ</sup> iff <sup>ρ</sup><sup>P</sup> and ρ<sup>Q</sup> are runs of the same word in P and Q respectively (Line 5). Concretely, for transition ˆτ<sup>A</sup> = (sA,(a, nA, lA), tA) in automaton <sup>A</sup>, where A∈{P , <sup>ˆ</sup> <sup>Q</sup>ˆ}, transition ˆτ<sup>P</sup> <sup>×</sup> <sup>τ</sup><sup>ˆ</sup> <sup>Q</sup> = ((s<sup>P</sup> , sQ),(a, n<sup>P</sup> , l<sup>P</sup> , nQ, lQ),(t<sup>P</sup> , tQ)) is in <sup>P</sup><sup>ˆ</sup> <sup>×</sup> <sup>Q</sup>ˆ.


**Lemma 1.** *Given weighted* ω*-automata* P *and* Q *with an* ω*-regular aggregate function* <sup>f</sup>*.* InclusionReg(P, Q, <sup>A</sup><sup>f</sup> ) *returns* True *iff* <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup>*.*

Further, InclusionReg is adapted for quantitative *strict*-inclusion <sup>P</sup> <sup>⊂</sup><sup>f</sup> <sup>Q</sup> i.e. for all words <sup>w</sup>, wt<sup>P</sup> (w) < wtQ(w) by taking the <sup>ω</sup>-regular comparator <sup>A</sup><sup>f</sup> that accepts (A, B) iff <sup>f</sup>(A) < f(B). Similarly for quantitative equivalence <sup>P</sup> <sup>≡</sup><sup>f</sup> <sup>Q</sup>.

**Complexity Analysis.** All operations in InclusionReg until Line 7 are polytime operations in the size of weighted <sup>ω</sup>-automata <sup>P</sup>, <sup>Q</sup> and comparator <sup>A</sup><sup>f</sup> . Hence, *Dim* is polynomial in size of <sup>P</sup>, <sup>Q</sup> and <sup>A</sup><sup>f</sup> . Line 8 solves a PSPACE-complete problem. Therefore, the quantitative inclusion for ω-regular aggregate function <sup>f</sup> is in PSPACE in size of the inputs <sup>P</sup>, <sup>Q</sup>, and <sup>A</sup><sup>f</sup> .

The PSPACE-hardness of the quantitative inclusion is established via reduction from the *qualitative* inclusion problem, which is PSPACE-complete. The formal reduction is as follows: Let P and Q be B¨uchi automata (with all states as accepting states). Reduce P, Q to weighted automata P, Q by assigning a weight of 1 to each transition. Since all runs in P, Q have the same weight sequence, weight of all words in P and Q is the same for any function f. It is easy to see <sup>P</sup> <sup>⊆</sup> <sup>Q</sup> (qualitative inclusion) iff <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup> (quantitative inclusion).

**Theorem 1.** *Let* <sup>P</sup> *and* <sup>Q</sup> *be weighted* <sup>ω</sup>*-automata and* <sup>A</sup><sup>f</sup> *be an* <sup>ω</sup>*-regular comparator. The complexity of the quantitative inclusion problem, quantitative strictinclusion problem, and quantitative equivalence problem for* ω*-regular aggregate function* f *is* PSPACE*-complete.*

Theorem 1 extends to weighted ω-automata when weight of words is the *infimum* of weight of runs. The key idea for <sup>P</sup> <sup>⊆</sup><sup>f</sup> <sup>Q</sup> here is to ensure that for every run <sup>ρ</sup><sup>Q</sup> in <sup>Q</sup> there exists a run on the same word in <sup>ρ</sup><sup>P</sup> in <sup>P</sup> s.t. <sup>f</sup>(ρ<sup>P</sup> ) <sup>≤</sup> <sup>f</sup>(ρQ).

**Representation of Counterexamples.** When P <sup>f</sup> Q, there exists word(s) <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> s.t wt<sup>P</sup> (w) > wtQ(w). Such a word <sup>w</sup> is said to be a *counterexample* *word*. Previously, finite-state representations of counterexamples have been useful in verification and synthesis in qualitative systems [5], and could be useful in quantitative settings as well. However, we are not aware of procedures for such representations in the quantitative settings. Here we show that a trivial extension of InclusionReg yields B¨uchi automata-representations for all counterexamples of the quantitative inclusion problem for ω-regular functions.

For word w to be a counterexample, it must contain a run in P that is not diminished. Clearly, all non-diminished runs of <sup>P</sup> are members of <sup>P</sup><sup>ˆ</sup> \ *Dim*. The counterexamples words can be obtained from <sup>P</sup><sup>ˆ</sup> \*Dim* by modifying its alphabet to the alphabet of P by dropping transition weights and their unique labels.

**Theorem 2.** *All counterexamples of the quantitative inclusion problem for an* ω*-regular aggregate function can be expressed by a B¨uchi automaton.*

#### **3.2 Incomplete-Information Quantitative Games**

Given an incomplete-information quantitative game <sup>G</sup> = (S, sI, *<sup>O</sup>*, Σ, δ, γ, f), our objective is to determine if player <sup>P</sup><sup>0</sup> has a winning strategy <sup>α</sup> : *<sup>O</sup>*<sup>∗</sup> <sup>→</sup> Σ for ω-regular aggregate function f. We assume we are given the ω-regular comparator <sup>A</sup><sup>f</sup> for function <sup>f</sup>. Note that a function <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>B</sup> can be treated like a B-labeled A-tree, and vice-versa. Hence, we proceed by finding a Σ-labeled *O*-tree – the *winning strategy tree*. Every branch of a winning strategy-tree is an observed play <sup>o</sup><sup>ρ</sup> of <sup>G</sup> for which every actual play <sup>ρ</sup> is a winning play for <sup>P</sup>0.

We first consider all *game trees* of G by interpreting G as a tree-automaton over <sup>Σ</sup>-labeled <sup>S</sup>-trees. Nodes <sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>∗</sup> of the game-tree correspond to states in S and labeled by actions in Σ taken by player P0. Thus, the *root node* ε corresponds to <sup>s</sup>I, and a node <sup>s</sup><sup>i</sup><sup>0</sup> ,...,s<sup>i</sup>*<sup>k</sup>* corresponds to the state <sup>s</sup><sup>i</sup>*<sup>k</sup>* reached via <sup>s</sup>I, s<sup>i</sup><sup>0</sup> ,...,s<sup>i</sup>*k−*<sup>1</sup> . Consider now a node <sup>x</sup> corresponding to state <sup>s</sup> and labeled by an action <sup>σ</sup>. Then <sup>x</sup> has children xs1, . . . xsn, for every <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup>. If <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>δ</sup>(s, σ), then we call xs<sup>i</sup> a *valid* child, otherwise we call it an *invalid* child. Branches that contain invalid children correspond to invalid plays.

A game-tree τ is a *winning tree* for player P<sup>0</sup> if every branch of τ is either a winning play for <sup>P</sup><sup>0</sup> or an invalid play of <sup>G</sup>. One can check, using an automata, if a play is invalid by the presence of invalid children. Furthermore, the winning condition for <sup>P</sup><sup>0</sup> can be expressed by the <sup>ω</sup>-regular comparator <sup>A</sup><sup>f</sup> that accepts (A, B) iff <sup>f</sup>(A) > f(B). To use the comparator <sup>A</sup><sup>f</sup> , it is determinized to parity automaton <sup>D</sup><sup>f</sup> . Thus, a product of game <sup>G</sup> with <sup>D</sup><sup>f</sup> is a deterministic parity tree-automaton accepting precisely winning-trees for player P0.

Winning trees for player P<sup>0</sup> are Σ-labeled S-trees. We need to convert them to Σ-labeled *O*-trees. Recall that every state has a unique observation. We can simulate these Σ-labeled S-trees on strategy trees using the technique of *thinning* states S to observations *O* [19]. The resulting alternating parity tree automaton <sup>M</sup> will accept a <sup>Σ</sup>-labeled *<sup>O</sup>*-tree <sup>τ</sup><sup>o</sup> iff for all actual game-tree <sup>τ</sup> of <sup>τ</sup>o, <sup>τ</sup> is a winning-tree for P<sup>0</sup> with respect to the strategy τo. The problem of existence of winning-strategy for <sup>P</sup><sup>0</sup> is then reduced to non-emptiness checking of <sup>M</sup>.

**Theorem 3.** *Given an incomplete-information quantitative game* <sup>G</sup> *and* <sup>ω</sup>*regular comparator* <sup>A</sup><sup>f</sup> *for the aggregate function* <sup>f</sup>*, the complexity of determining whether* <sup>P</sup><sup>0</sup> *has a winning strategy is exponential in* |G| · |D<sup>f</sup> <sup>|</sup>*, where* <sup>|</sup>D<sup>f</sup> <sup>|</sup> <sup>=</sup> |A<sup>f</sup> <sup>|</sup> <sup>O</sup>(|A*<sup>f</sup>* <sup>|</sup>) *.*

Since, <sup>D</sup><sup>f</sup> is the deterministic parity automaton equivalent to <sup>A</sup><sup>f</sup> , <sup>|</sup>D<sup>f</sup> <sup>|</sup> <sup>=</sup> |A<sup>f</sup> | <sup>O</sup>(|A*<sup>f</sup>* <sup>|</sup>) . The thinning operation is linear in size of |G × <sup>D</sup><sup>f</sup> <sup>|</sup>, therefore |M| <sup>=</sup> |G| · |D<sup>f</sup> <sup>|</sup>. Non-emptiness checking of alternating parity tree automata is exponential. Therefore, our procedure is doubly exponential in size of the comparator and exponential in size of the game. The question of tighter bounds is open.

### **4 Discounted-Sum Comparator**

The discounted-sum of an infinite sequence A with discount-factor d > 1, denoted by *DS*(A, d), is defined as Σ<sup>∞</sup> <sup>i</sup>=0A[i]/d<sup>i</sup> . The discounted-sum comparator (DScomparator, in short) for discount-factor <sup>d</sup>, denoted by <sup>A</sup>*DS*(*d*) , accepts a pair (A, B) of weight sequences iff *DS*(A, d) < *DS*(B, d). We investigate properties of the DS-comparator, and show that the DS-comparator is ω-regular for all integral discount-factors d, and cannot be ω-regular when 1 <d< 2.

**Theorem 4.** *DS-comparator for rational discount-factor* 1 <d< 2 *is not* ω*regular.*

For discounted-sum automaton <sup>A</sup> with discount factor <sup>d</sup>, the *cut-point language* of <sup>A</sup> w.r.t. <sup>r</sup> <sup>∈</sup> <sup>R</sup> is defined as <sup>L</sup>≥<sup>r</sup> <sup>=</sup> {<sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)|DS(w, d) <sup>≥</sup> <sup>r</sup>}. It is known that the cut-point language L≥<sup>1</sup> with discount-factor 1 <d< 2 is not ω-regular [9]. One can show that if DS-comparator for discount-factor 1 <d< 2 were ωregular, then cut-point language L≥<sup>1</sup> is also ω-regular; thus proving Theorem 4.

We provide the construction of DS-comparator with integer discount-factor.

**Key Ideas.** The core intuition is that sequences bounded by μ can be converted to their value in base d via a finite-state transducer. Lexicographic comparison of the resulting sequences renders the desired result. Conversion of sequences to base d requires a certain amount of *book-keeping* by the transducer. Here we describe a direct method for book-keeping and lexicographic comparison.

For natural-number sequence A and integer discount-factor d > 1, *DS*(A, d) can be interpreted as a value in base d i.e. *DS*(A, d) = A[0] + <sup>A</sup>[1] <sup>d</sup> <sup>+</sup> <sup>A</sup>[2] <sup>d</sup><sup>2</sup> + ··· = (A[0].A[1]A[2] ...)<sup>d</sup> [12]. Unlike comparison of numbers in base <sup>d</sup>, the lexicographically larger sequence may not be larger in value. This occurs because (i) The elements of weight sequences may be larger in value than base d, and (ii) Every value has multiple infinite-sequence representations.

To overcome these challenges, we resort to arithmetic techniques in base d. Note that *DS*(B, d) > *DS*(A, d) iff there exists a sequence C such that *DS*(B, d) = *DS*(A, d) + *DS*(C, d), and *DS*(C, d) > 0. Therefore, to compare the discounted-sum of A and B, we obtain a sequence C. Arithmetic in base d also results in sequence X of carry elements. Then, we see:

**Lemma 2.** *Let* A, B, C, X *be number sequences,* d > 1 *be a positive integer such that following equations holds true:*

*1. When* i = 0*,* A[0] + C[0] + X[0] = B[0] *2. When* <sup>i</sup> <sup>≥</sup> <sup>1</sup>*,* <sup>A</sup>[i] + <sup>C</sup>[i] + <sup>X</sup>[i] = <sup>B</sup>[i] + <sup>d</sup> · <sup>X</sup>[<sup>i</sup> <sup>−</sup> 1]

*Then DS*(B, d) = *DS*(A, d) + *DS*(C, d)*.*

Hence, to determine *DS*(B, d)−*DS*(A, d), systematically guess sequences <sup>C</sup> and X using the equations, element-by-element beginning with the 0-th index and moving rightwards. There are two crucial observations here: (i) Computation of <sup>i</sup>-th element of <sup>C</sup> and <sup>X</sup> only depends on <sup>i</sup>-th and (<sup>i</sup> <sup>−</sup> 1)-th elements of <sup>A</sup> and B. Therefore guessing C[i] and X[i] requires *finite memory* only. (ii) C refers to a representation of value *DS*(B, d) <sup>−</sup> *DS*(A, d) in base <sup>d</sup>, and <sup>X</sup> is the carrysequence. Hence if A and B are bounded-integer sequences, not only are X and C bounded sequences, they can be constructed from a *fixed finite set of integers*:

**Lemma 3.** *Let* d > 1 *be an integer discount-factor. Let* A *and* B *be nonnegative integer sequences bounded by* μ *s.t. DS*(A, d) < *DS*(B, d)*. Let C and X be as constructed in Lemma 2. There exists at least one pair of integer-sequences* C *and* X *that satisfy the following two equations*

*1. For all* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*,* <sup>0</sup> <sup>≤</sup> <sup>C</sup>[i] <sup>≤</sup> <sup>μ</sup> · <sup>d</sup> <sup>d</sup>−<sup>1</sup> *. and 2. For all* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*,* <sup>0</sup> ≤ |X[i]| ≤ 1 + <sup>μ</sup> <sup>d</sup>−<sup>1</sup>

In B¨uchi automaton <sup>A</sup>*DS*(*d*) (i) states are represented by (x, c) where <sup>x</sup> and c range over all possible elements of X and C, which are finite, (ii) a special start state s, (iii) transitions from the start state (s,(a, b),(x, c)) satisfy a + c + x = b to replicate Eq. 1 (Lemma 2) at the 0-th index, (iv) all other transitions ((x1, c1),(a, b),(x2, c2)) satisfy <sup>a</sup>+c2+x<sup>2</sup> <sup>=</sup> <sup>b</sup>+d·x<sup>1</sup> to replicate Eq. 2 (Lemma 2) at indexes i > 0, and (v) all (x, c) states are accepting. Lemma 2 ensures that <sup>A</sup>*DS*(*d*) accepts (A, B) iff *DS*(B, d) = *DS*(A, d) + *DS*(C, d).

However, <sup>A</sup>*DS*(*d*) is yet to guarantee *DS*(C, d) <sup>&</sup>gt; 0. For this, we include non-accepting states (x, <sup>⊥</sup>), where <sup>x</sup> ranges over all possible (finite) elements of <sup>X</sup>. Transitions into and out of states (x, <sup>⊥</sup>) satisfy Eqs. 1 or 2 (depending on whether transition is from start state <sup>s</sup>) where <sup>⊥</sup> is treated as <sup>c</sup> = 0. Transition from (x, <sup>⊥</sup>)-states to (x, c)-states occurs only if c > 0. Hence, any valid execution of (A, B) will be an accepting run only if the execution witnesses a non-zero value of c. Since C is a non-negative sequence, this ensures *DS*(C, d) > 0.

**Construction.** Let <sup>μ</sup><sup>C</sup> <sup>=</sup> <sup>μ</sup>· <sup>d</sup> <sup>d</sup>−<sup>1</sup> and <sup>μ</sup><sup>X</sup> = 1+ <sup>μ</sup> <sup>d</sup>−<sup>1</sup> . <sup>A</sup>*DS*(*d*) = (*S*,Σ,δd,*Init*, <sup>F</sup>)

– *<sup>S</sup>* <sup>=</sup> *Init* ∪F∪ <sup>S</sup><sup>⊥</sup> where *Init* <sup>=</sup> {s}, <sup>F</sup> <sup>=</sup> {(x, c)||x| ≤ <sup>μ</sup>X, <sup>0</sup> <sup>≤</sup> <sup>c</sup> <sup>≤</sup> <sup>μ</sup><sup>C</sup> }, and <sup>S</sup><sup>⊥</sup> <sup>=</sup> {(x, <sup>⊥</sup>)||x| ≤ <sup>μ</sup><sup>X</sup>} where <sup>⊥</sup> is a special character, and <sup>c</sup> <sup>∈</sup> <sup>N</sup>, <sup>x</sup> <sup>∈</sup> <sup>Z</sup>. – <sup>Σ</sup> <sup>=</sup> {(a, b):0 <sup>≤</sup> a, b <sup>≤</sup> <sup>μ</sup>} where <sup>a</sup> and <sup>b</sup> are integers. – <sup>δ</sup><sup>d</sup> <sup>⊂</sup> *<sup>S</sup>* <sup>×</sup> <sup>Σ</sup> <sup>×</sup> *<sup>S</sup>* is defined as follows: 1. Transitions from start state s: i (s,(a, b),(x, c)) for all (x, c) ∈ F s.t. <sup>a</sup> <sup>+</sup> <sup>x</sup> <sup>+</sup> <sup>c</sup> <sup>=</sup> <sup>b</sup> and <sup>c</sup> = 0

ii (s,(a, b),(x, <sup>⊥</sup>)) for all (x, <sup>⊥</sup>) <sup>∈</sup> <sup>S</sup><sup>⊥</sup> s.t. <sup>a</sup> <sup>+</sup> <sup>x</sup> <sup>=</sup> <sup>b</sup>


**Theorem 5.** *The DS-comparator with maximum bound* μ*, is* ω*-regular for integer discount-factors* d > <sup>1</sup>*. Size of the discounted-sum comparator is* <sup>O</sup>(<sup>μ</sup><sup>2</sup> <sup>d</sup> )*.*

DS-comparator with non-strict inequality ≤ and equality = follow similarly. Consequently, properties of ω-regular comparators hold for DS-comparator with integer discount-factor. Specifically, DS-inclusion is PSPACE-complete in size of the input weighted automata and DS-comparator. Since, size of DS-comparator is polynomial w.r.t. to upper bound μ (in unary), DS-inclusion is PSPACE in size of input weighted automata and μ. Not only does this bound improve upon the previously known upper bound of EXPTIME but it also closes the gap between upper and lower bounds for DS-inclusion.

**Corollary 1.** *Given weighted automata* P *and* Q*, maximum weight on their transitions* μ *in unary form and integer discount-factor* d > 1*, the DS-inclusion, DS-strict-inclusion, and DS-equivalence problems are* PSPACE*-complete.*

As mentioned earlier, the known upper bound for discounted-sum inclusion with integer discount-factor is exponential [6,10]. This bound is based on an exponential determinization construction (subset construction) combined with arithmetical reasoning. We observe that the determinization construction can be performed on-the-fly in PSPACE. To perform, however, the arithmetical reasoning on-the-fly in PSPACE would require essentially using the same bit-level ((x, c) state) techniques that we have used to construct DS-comparator automata.

### **5 Limit-Average Comparator**

The limit-average of an infinite sequence M is the point of convergence of the average of prefixes of <sup>M</sup>. Let Sum(M[0, n <sup>−</sup> 1]) denote the sum of the <sup>n</sup>-length prefix of sequence M. The *limit-average infimum*, denoted by LimInfAvg(M), is defined as lim inf<sup>n</sup>→∞ <sup>1</sup> <sup>n</sup> ·Sum(M[0, n−1]). Similarly, the *limit-average supremum*, denoted by LimSupAvg(M), is defined as lim sup<sup>n</sup>→∞ <sup>1</sup> <sup>n</sup> · Sum(M[0, n <sup>−</sup> 1]). The limit-average of sequence M, denoted by LimAvg(M), is defined *only if* the limitaverage infimum and limit-average supremum coincide, and then LimAvg(M) = LimInfAvg(M) (= LimSupAvg(M)). Note that while limit-average infimum and supremum exist for all bounded sequences, the limit-average may not.

In existing work, limit-average is defined as the limit-average infimum (or limit-average supremum) to ensure that limit-average exists for all sequences [7,10,11,22]. While this definition is justified in context of the application, it may lead to a misleading comparison in some cases. For example, consider sequence A s.t. LimSupAvg(A) = 2 and LimInfAvg(A) = 0, and sequence B s.t. LimAvg(B) = 1. Clearly, limit-average of A does not exist. Suppose, LimAvg(A) = LimInfAvg(A) = 0, then LimAvg(A) < LimAvg(B), deluding that average of prefixes of A are always less than those of B in the limit. This is untrue since LimSupAvg(A) = 2.

Such inaccuracies in limit-average comparison may occur when the limitaverage of at least one sequence does not exist. However, it is not easy to distinguish sequences for which limit-average exists from those for which it doesn't.

We define *prefix-average comparison* as a relaxation of limit-average comparison. Prefix-average comparison coincides with limit-average comparison when limit-average exists for both sequences. Otherwise, it determines whether eventually the average of prefixes of one sequence are greater than those of the other. This comparison does not require the limit-average to exist to return intuitive results. Further, we show that the *prefix-average comparator* is ω-context-free.

#### **5.1 Limit-Average Language and Comparison**

Let <sup>Σ</sup> <sup>=</sup> {0, <sup>1</sup>,...,μ} be a finite alphabet with μ > 0. The *limit-average language* <sup>L</sup>LA contains the sequence (word) <sup>A</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> iff its limit-average exists. Suppose <sup>L</sup>LA were <sup>ω</sup>-regular, then <sup>L</sup>LA <sup>=</sup> n <sup>i</sup>=0 <sup>U</sup><sup>i</sup> · <sup>V</sup> <sup>ω</sup> <sup>i</sup> , where <sup>U</sup>i, V<sup>i</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> are regular languages over *finite* words. The limit-average of sequences is determined by its behavior in the limit, so limit-average of sequences in V <sup>ω</sup> <sup>i</sup> exists. Additionally, the average of all (finite) words in V<sup>i</sup> must be the same. If this were not the case, then two words in V<sup>i</sup> with unequal averages l<sup>1</sup> and l2, can generate a word <sup>w</sup> <sup>∈</sup> <sup>V</sup> <sup>ω</sup> <sup>i</sup> s.t the average of its prefixes oscillates between l<sup>1</sup> and l2. This cannot occur, since limit-average of w exists. Let the average of sequences in V<sup>i</sup> be ai, then limit-average of sequences in V <sup>ω</sup> <sup>i</sup> and <sup>U</sup><sup>i</sup> ·<sup>V</sup> <sup>ω</sup> <sup>i</sup> is also ai. This is contradictory since there are sequences with limit-average different from the a<sup>i</sup> (see appendix). Similarly, since every ω-CFL is represented by n <sup>i</sup>=1 <sup>U</sup><sup>i</sup> · <sup>V</sup> <sup>ω</sup> <sup>i</sup> for CFLs Ui, V<sup>i</sup> over finite words [13], a similar argument proves that <sup>L</sup>LA is not <sup>ω</sup>-context-free.

Quantifiers <sup>∃</sup><sup>∞</sup><sup>i</sup> and <sup>∃</sup><sup>f</sup> <sup>i</sup> denote the existence of *infinitely* many and *only finitely* many indices i, respectively.

### **Theorem 6.** <sup>L</sup>LA *is neither an* <sup>ω</sup>*-regular nor an* <sup>ω</sup>*-context-free language.*

In the next section, we will define *prefix-average comparison* as a relaxation of limit-average comparison. To show how prefix-average comparison relates to limit-average comparison, we will require the following two lemmas:

**Lemma 4.** *Let* A *and* B *be sequences s.t. their limit average exists. If* <sup>∃</sup><sup>∞</sup>i, Sum(A[0, i <sup>−</sup> 1]) <sup>≥</sup> Sum(B[0, i <sup>−</sup> 1]) *then* LimAvg(A) <sup>≥</sup> LimAvg(B)*.*

**Lemma 5.** *Let* A*,* B *be sequences s.t their limit-average exists. If* LimAvg(A) > LimAvg(B) *then* <sup>∃</sup><sup>f</sup> i, Sum(B[0, i <sup>−</sup> 1]) <sup>≥</sup> Sum(A[0, i <sup>−</sup> 1]) *and* <sup>∃</sup><sup>∞</sup>i, Sum(A[0, i <sup>−</sup> 1]) <sup>&</sup>gt; Sum(B[0, i <sup>−</sup> 1])*.*

#### **5.2 Prefix-Average Comparison and Comparator**

The previous section relates limit-average comparison with the sums of equal length prefixes of the sequences (Lemmas 4 and 5). The comparison criteria is based on the number of times sum of prefix of one sequence is greater than the other, which does not rely on the existence of limit-average. Unfortunately, this criteria cannot be used for limit-average comparison since it is incomplete (Lemma 5). Specifically, for sequences A and B with equal limit-average it is possible that <sup>∃</sup>∞i, Sum(A[0, n <sup>−</sup> 1]) <sup>&</sup>gt; Sum(B[0, n <sup>−</sup> 1]) and <sup>∃</sup>∞i, Sum(B[0, n <sup>−</sup> 1]) <sup>&</sup>gt; Sum(A[0, n <sup>−</sup> 1]). Instead, we use this criteria to define *prefix-average comparison*. In this section, we define prefix-average comparison and explain how it relaxes limit-average comparison. Lastly, we construct the prefix-average comparator, and prove that it is not ω-regular but is ω-context-free.

**Definition 7 (Prefix-average comparison).** *Let* A *and* B *be number sequences. We say* PrefixAvg(A) <sup>≥</sup> PrefixAvg(B) *if* <sup>∃</sup><sup>f</sup> i, Sum(B[0, i <sup>−</sup> 1]) <sup>≥</sup> Sum(A[0, i <sup>−</sup> 1]) *and* <sup>∃</sup><sup>∞</sup>i, Sum(A[0, i <sup>−</sup> 1]) <sup>&</sup>gt; Sum(B[0, i <sup>−</sup> 1])*.*

Intuitively, prefix-average comparison states that PrefixAvg(A) <sup>≥</sup> PrefixAvg(B) if eventually the sum of prefixes of <sup>A</sup> are always greater than those of <sup>B</sup>. We use <sup>≥</sup> since the average of prefixes may be equal when the difference between the sum is small. It coincides with limit-average comparison when the limit-average exists for both sequences. Definition 7 and Lemmas 4, 5 relate limit-average comparison and prefix-average comparison:

**Corollary 2.** *When limit-average of* A *and* B *exists, then*

$$\vdash \mathsf{PrefixAvg}(A) \geq \mathsf{PrefixAvg}(B) \implies \mathsf{LimAvg}(A) \geq \mathsf{LimAvg}(B).$$

*–* LimAvg(A) <sup>&</sup>gt; LimAvg(B) =<sup>⇒</sup> PrefixAvg(A) <sup>≥</sup> PrefixAvg(B)*.*

Therefore, limit-average comparison and prefix-average comparison return the same result on sequences for which limit-average exists. In addition, prefixaverage returns intuitive results when even when limit-average may not exist. For example, suppose limit-average of A and B do not exist, but LimInfAvg(A) > LimSupAvg(B), then PrefixAvg(A) <sup>≥</sup> PrefixAvg(B). Therefore, prefix-average comparison relaxes limit-average comparison.

The rest of this section describes *prefix-average comparator* A *PA*(*·*) , an automaton that accepts the pair (A, B) of sequences iff PrefixAvg(A) <sup>≥</sup> PrefixAvg(B).

**Lemma 6** *(Pumping Lemma for* ω*-regular language [2]). Let* L *be an* ω*regular language. There exists* <sup>p</sup> <sup>∈</sup> <sup>N</sup> *such that, for each* <sup>w</sup> <sup>=</sup> <sup>u</sup>1w1u2w<sup>2</sup> ··· ∈ <sup>L</sup> *such that* <sup>|</sup>w<sup>i</sup>| ≥ <sup>p</sup> *for all* <sup>i</sup>*, there are sequences of finite words* (xi)<sup>i</sup>∈<sup>N</sup>*,* (yi)<sup>i</sup>∈<sup>N</sup>*,* (zi)<sup>i</sup>∈<sup>N</sup> *s.t., for all* <sup>i</sup>*,* <sup>w</sup><sup>i</sup> <sup>=</sup> <sup>x</sup>iyizi*,* <sup>|</sup>xiy<sup>i</sup>| ≤ <sup>p</sup> *and* <sup>|</sup>y<sup>i</sup><sup>|</sup> <sup>&</sup>gt; <sup>0</sup> *and for every sequence of pumping factors* (ji)<sup>i</sup>∈<sup>N</sup> <sup>∈</sup> <sup>N</sup>*, the pumped word* <sup>u</sup>1x1y<sup>j</sup><sup>1</sup> <sup>1</sup> <sup>z</sup>1u2x2y<sup>j</sup><sup>2</sup> <sup>2</sup> <sup>z</sup><sup>2</sup> ···∈ <sup>L</sup>*.*

**Theorem 7.** *The prefix-average comparator is not* ω*-regular.*

*Proof (Proof Sketch).* We use Lemma 6 to prove that A *P A*(*·*) is not <sup>ω</sup>-regular. Suppose A *PA*(*·*) were <sup>ω</sup>-regular. For p > <sup>0</sup> <sup>∈</sup> <sup>N</sup>, let <sup>w</sup> = (A, B) = ((0, 1)p(1, 0)<sup>2</sup>p)ω. The segment (0, 1)<sup>∗</sup> can be pumped s.t the resulting word is no longer in L *P A*(*·*) .

Concretely, A = (0p1<sup>2</sup>p)ω, B = (1p0<sup>2</sup>p)ω, LimAvg(A) = <sup>2</sup> <sup>3</sup> , LimAvg(B) = <sup>1</sup> 3 . So, <sup>w</sup> = (A, B) ∈ A *P A*(*·*) . Select as factor <sup>w</sup><sup>i</sup> (from Lemma 6) the sequence (0, 1)p. Pump each y<sup>i</sup> enough times so that the resulting word is ˆw = (A, ˆ Bˆ) = ((0, 1)<sup>m</sup>*<sup>i</sup>* (1, 0)<sup>2</sup><sup>p</sup>)<sup>ω</sup> where <sup>m</sup><sup>i</sup> <sup>&</sup>gt; <sup>4</sup>p. It is easy to show that ˆ<sup>w</sup> = (A, <sup>ˆ</sup> <sup>B</sup>ˆ) ∈ L/ *P A*(*·*) .

We discuss key ideas and sketch the construction of the prefix average comparator. The term *prefix-sum difference at* <sup>i</sup> indicates Sum(A[0, i−1])−Sum(B[0, i<sup>−</sup> 1]), i.e. the difference between sum of i-length prefix of A and B.

**Key Ideas.** For sequences <sup>A</sup> and <sup>B</sup> to satisfy PrefixAvg(A) <sup>≥</sup> PrefixAvg(B), <sup>∃</sup><sup>f</sup> i, Sum(B[0, i−1]) <sup>≥</sup> Sum(A[0, i−1]) and <sup>∃</sup><sup>∞</sup>i, Sum(A[0, i−1]) <sup>&</sup>gt; Sum(B[0, i<sup>−</sup> 1]). This occurs iff there exists an index <sup>N</sup> s.t. for all indices i>N, Sum(A[0, i<sup>−</sup> 1]) <sup>−</sup> Sum(B[0, i <sup>−</sup> 1]) <sup>&</sup>gt; 0. While reading a word, the prefix-sum difference is maintained by states and the stack of ω-PDA: states maintain whether it is negative or positive, while number of tokens in the stack equals its absolute value. The automaton non-deterministically guesses the aforementioned index N, beyond which the automaton ensure that prefix-sum difference remains positive.

**Construction Sketch.** The push-down comparator <sup>A</sup> *PA*(*·*) consists of three states: (i) State s<sup>P</sup> and (ii) State s<sup>N</sup> that indicate that the prefix-sum difference is greater than zero and or not respectively, (iii) accepting state s<sup>F</sup> . An execution of (A, B) begins in state s<sup>N</sup> with an empty stack. On reading letter (a, b), the stack pops or pushes <sup>|</sup>(<sup>a</sup> <sup>−</sup> <sup>b</sup>)<sup>|</sup> tokens from the stack depending on the current state of the execution. From state <sup>s</sup><sup>P</sup> , the stack pushes tokens if (<sup>a</sup> <sup>−</sup> <sup>b</sup>) <sup>&</sup>gt; <sup>0</sup>, and pops otherwise. The opposite occurs in state s<sup>N</sup> . State transition between s<sup>N</sup> and s<sup>P</sup> occurs only if the stack action is to pop but the stack consists of k < <sup>|</sup><sup>a</sup> <sup>−</sup> <sup>b</sup><sup>|</sup> tokens. In this case, stack is emptied, state transition is performed and <sup>|</sup><sup>a</sup> <sup>−</sup> <sup>b</sup>| − <sup>k</sup> tokens are pushed into the stack. For an execution of (A, B) to be an accepting run, the automaton non-deterministically transitions into state s<sup>F</sup> . State s<sup>F</sup> acts similar to state s<sup>P</sup> except that execution is terminated if there aren't enough tokens to pop out of the stack. A *PA*(*·*) accepts by accepting state.

To see why the construction is correct, it is sufficient to prove that at each index <sup>i</sup>, the number of tokens in the stack is equal to <sup>|</sup>Sum(A[0, i <sup>−</sup> 1]) <sup>−</sup> Sum(B[0, i−1])|. Furthermore, in state <sup>s</sup><sup>N</sup> , Sum(A[0, i−1])−Sum(B[0, i−1]) <sup>≤</sup> 0, and in state <sup>s</sup><sup>P</sup> and <sup>s</sup><sup>F</sup> , Sum(A[0, i−1])−Sum(B[0, i−1]) <sup>&</sup>gt; 0. Next, the index at which the automaton transitions to the accepting state s<sup>F</sup> coincides with index N. The execution is accepted if it has an infinite execution in state s<sup>F</sup> , which allows transitions only if Sum(A[0, i <sup>−</sup> 1]) <sup>−</sup> Sum(B[0, i <sup>−</sup> 1]) <sup>&</sup>gt; 0.

**Theorem 8.** *The prefix-average comparator is an* ω*-CFL.*

While ω-CFL can be easily expressed, they do not possess closure properties, and problems on ω-CFL are easily undecidable. Hence, the application of ωcontext-free comparator will require further investigation.

### **6 Conclusion**

In this paper, we identified a novel mode for comparison in quantitative systems: the online comparison of aggregate values of sequences of quantitative weights. This notion is embodied by comparators automata that read two infinite sequences of weights synchronously and relate their aggregate values. We showed that ω-regular comparators not only yield generic algorithms for problems including quantitative inclusion and winning strategies in incompleteinformation quantitative games, they also result in algorithmic advances. We show that the discounted-sum inclusion problem is PSAPCE-complete for integer discount-factor, hence closing a complexity gap. We also studied the discountedsum and prefix-average comparator, which are ω-regular and ω-context-free, respectively.

We believe comparators, especially ω-regular comparators, can be of significant utility in verification and synthesis of quantitative systems, as demonstrated by the existence of finite-representation of counterexamples of the quantitative inclusion problem. Another potential application is computing equilibria in quantitative games. Applications of the prefix-average comparator, in general ω-context-free comparators, is open to further investigation. Another direction to pursue is to study aggregate functions in more detail, and develop a clearer understanding of when aggregate functions are ω-regular.

**Acknowledgements.** We thank the anonymous reviewers for their comments. We thank K. Chatterjee, L. Doyen, G. A. Perez and J. F. Raskin for corrections to earlier drafts, and their contributions to this paper. We thank P. Ganty and R. Majumdar for preliminary discussions on the limit-average comparator. This work was partially supported by NSF Grant No. 1704883, "Formal Analysis and Synthesis of Multiagent Systems with Incentives".

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Logics and Equational Theories

### **Modular Tableaux Calculi for Separation Theories**

Simon Docherty1(B) and David Pym1,2

<sup>1</sup> University College London, London, UK {simon.docherty.14,d.pym}@ucl.ac.uk <sup>2</sup> The Alan Turing Institute, London, UK

**Abstract.** In recent years, the key principles behind Separation Logic have been generalized to generate formalisms for a number of verification tasks in program analysis via the formulation of 'non-standard' models utilizing notions of separation distinct from heap disjointness. These models can typically be characterized by a *separation theory*, a collection of first-order axioms in the signature of the model's underlying ordered monoid. While all separation theories are interpreted by models that instantiate a common mathematical structure, many are undefinable in Separation Logic and determine different classes of valid formulae, leading to incompleteness for existing proof systems. Generalizing systems utilized in the proof theory of bunched logics, we propose a framework of tableaux calculi that are generically extendable by rules that correspond to separation theories axiomatized by coherent formulas. This class covers all separation theories in the literature—for both classical and intuitionistic Separation Logic—as well as axioms for a number of related formalisms appropriate for reasoning about complex systems, security, and concurrency. Parametric soundness and completeness of the framework is proved by a novel representation of tableaux systems as coherent theories, suggesting a strategy for implementation and a tentative first step towards a new logical framework for non-classical logics.

**Keywords:** Bunched logic · Coherent logic · Kripke semantics Proof theory · Separation logic · Separation theories Substructural logic · Tableaux

### **1 Introduction**

Separation Logic [39], introduced by Ishtiaq and O'Hearn [32], Reynolds [44], Yang and O'Hearn [50], is a Hoare-style program logic suitable for reasoning about programs that mutate data structures. In its original formulation, the assertion language of Separation Logic is based on a model of O'Hearn and Pym's logic of bunched implications [40] formulated by considering heaps as possible worlds with internal structure that allows their decomposition into separate pieces of memory. This decomposition is witnessed in the logic by the *separating conjunction* <sup>∗</sup>, with <sup>φ</sup> <sup>∗</sup> <sup>ψ</sup> informally read as 'the heap can be split into *separate* parts; one satisfying φ and the other satisfying ψ'.

Calcagno et al. [13] abstract the details of the heap model to a structure called a *separation algebra*, a partial-deterministic and cancellative monoid model of the Boolean logic of bunched implications (BBI), which can be used to generate bespoke separation logics suitable for program analysis tasks beyond that of the original formalism. Conflicting definitions of separation algebra have since been given by adding/removing first-order properties or strengthening/weakening the monoid properties [10,14,21,24]. These mutually exclusive definitions can be encompassed in a framework of *separation theories* [10], collections of first-order axioms (*separation properties*) common to separation logic models which the definition of (B)BI model can be extended by. All separation logics in the literature can be seen to be models of separation theories, while the frameworks Views [21] and Iris [33] explicitly implement the idea of generating program logics parametrically by separation theory.

Recent work has revealed an expressivity gap between the logic of bunched implications and common separation theories in the literature, however. Brotherston and Villard [10], Larchey-Wendling and Galmiche [36] show that separation properties like indivisibility of units and partial deterministic composition determine distinct sets of valid BBI formulae, leading to the incompleteness of standard proof systems with respect to typical classes of memory models. To make matters worse, Brotherston and Villard additionally show that many separation properties (among them partial determinism) are undefinable in BBI, and thus cannot be axiomatized by the logic. These results also hold for BI, the intuitionistic logic of bunched implications. This is an increasingly relevant issue given the growing number of intuitionistic separation logics, most prominent amongst them Iris, a framework that utilizes a 'later' modality [37] that can only be nontrivially defined in intuitionistic systems.

This expressivity gap is a significant problem for Separation Logic. A theorem prover for deriving assertions satisfied by the underlying model is a necessary component of any implementation of a separation logic, with the deployable proof theory of the standard formalism crucial for its scalability to large code bases [12,50]. Standard implementations are model-specific, however, and only suitable for the heap model. In order to account for the large numbers of bespoke separation logics, as well as Views/Iris-style frameworks, we require tools that support parametrization by separation theory.

**Technical Approach.** The present work generalizes methods pioneered on tableaux systems for a range of logics including and related to BI and BBI [20,22,28,34] to specify modular tableaux calculi for the breadth of separation theories in the literature, proved sound and complete uniformly and parametrically in choice of separation theory. While previous systems implicitly implement a systematic method for constructing tableaux proof theory for bunched logics, subtle but significant changes must be made to additionally capture separation theories. Past systems can be formulated as particular instances of our framework, thus making the systematic method explicit.

First, we specify tableaux proof systems for BI and BBI, the propositional basis for Separation Logic. The key difference between our calculi and tableaux systems previously given in the literature is that we do not outsource any part of the derivation of proofs to an algebra of labels or auxilliary proof system for constraints. Instead, we utilize *frame expansion rules* that are of the same form as the standard *logical expansion rules* of the system. These rules capture the same structural properties (and more) but can also be added/removed in a modular fashion. Crucially, this ensures separation properties—for example, partial determinism—are not hard-coded into the basic systems via the structure of labels, and facilitates the parametricity of our completeness theorem.

We extend these systems with a rule schema for separation properties that are axiomatized by *coherent formulae*; a subset of first-order formulae with a special syntactic form. This set contains every separation property that can be found in the literature and is expressive enough to include virtually any axiom that might be utilized in future. The strength of this statement can be justified by a folklore result recently reconstructed by Dyckhoff and Negri [25] that shows that *every* first-order axiom can be reconstructed as an equivalent system of coherent formulae. We thus obtain a modular framework of (B)BI +Σ-tableaux systems, where Σ is an arbitrary collection of coherent axioms.

In order to prove soundness and completeness of the system, we utilize a novel representation of labelled tableaux systems as theories of coherent logic. The key insight here is that the translation of coherent formulae into tableaux rules is not one way: tableaux rules can naturally be seen as coherent formulae in a signature augmented with special predicate symbols. The parametric soundness and completeness of the framework can then be reduced to proving the soundness and completeness of Tarskian truth for coherent logic with respect to a metatableaux method, a problem positively resolved by Bezem and Coquand [4]. To our knowledge, the application of this technique to labelled tableaux is new, although, in the aforementioned work, Bezem and Coquand show how to encode the tableaux method for first-order classical logic as a coherent theory, and trace the idea of abbreviating formulae with predicate symbols to Skolem [47].

**Contributions.** We identify three principal contributions.


On points 2 and 3, we believe many tableaux systems in the literature are subsumed by this method, with their respective 'Hintikka set' completeness proofs actually localized instances of the parametric completeness theorem given here. This suggests the possibility of a logical framework for non-classical logics via the representation of tableaux systems as coherent theories. This may be related to Schmitt and Tishkovsky's [45] technique for automatically synthesising tableaux calculi for logics that can be presented as first-order theories in a particular form. We believe the "rule refinement" post-processing their tableau rules undergo after synthesis can be made redundant by instead synthesising from coherent theories, but we defer such an investigation to another occasion.

**Related Work.** While much work has been done on the proof theory of BI and BBI [9,28,29,41], as well as proof systems for the concrete heap model of Separation Logic [5,27,30], very little exists for separation theories. A key exception to this is H´ou et al.'s [31] labelled sequent calculi for propositional abstract separation logic. There, a labelled sequent calculus for BBI is extended with rules corresponding to the most common separation properties – *partial determinism, cancellativity, indivisible unit* and *disjointness* – and completeness and cut elimination is proved. In H´ou's PhD dissertation [29] the properties *cross-split* and *splittability* are additionally handled, although completeness for these new rules requires 'non-trivial changes' to the previous proofs.

The classes of model captured by our systems strictly extend those of H´ou et al. [31]—in particular, by additionally considering classes of BI models that are appropriate for intuitionistic separation logics—and our calculi are proved complete uniformly. Our systems are also generically extendable according to a rule schema, meaning the framework should be suitable for new separation theories devised in the future. A deficiency of our approach with respect to H´ou et al.'s is a lack of implementation, though we note that the representation of our systems as theories of coherent logic suggests off-the-shelf coherent logic provers (cf. [43]) could be used to give naive implementations of our framework.

Brotherston and Villard [10] deal with the undefinability of separation theories by defining a conservative extension of BBI called HyBBI, extending the syntax with nominals, satisfaction operators and binders. This extra expressivity leads to the axiomatizability of the undefinable separation properties. This work is not specifically concerned with proof theory, giving only a Hilbert-style system for HyBBI, and has the defect of requiring modifications to the syntax of Separation Logic. In addition, a significant theoretical reformulation would be required to capture intuitionistic separation theories this way. In contrast, in our work the necessary machinery is internalized within the proof system and both Boolean and intuitionistic cases are taken care of uniformly.

Finally, we connect our work to a line of research in proof theory investigating the generation of proof rules from coherent theories. Simpson [46] and Bra¨uner [8] have used this technique to produce natural deduction rules, while Negri [38] has extensively developed it to generate (systems of) labelled sequent rules from frame conditions axiomatized by (generalized) coherent formulae. To our knowledge the present work is the first application of these ideas to the tableaux method. In addition, we believe the encoding of the proof systems themselves as coherent theories is novel.

### **2 Preliminaries**

**The Logics of Bunched Implications.** We first recall O'Hearn and Pym's *logics of bunched implications* BI and BBI [40], the propositional basis of Separation Logic's assertion language. BI and BBI are archetypal examples of *bunched logics*; systems given by combining the standard *additives* of classical or intutionistic propositional logic with the *multiplicatives* of a substructural logic. This idea has been developed to give logics for reasoning about concurrency [23] and the layering structure of complex systems [17,18,22], Hennessey-Milner-style process logics for reasoning about security and systems modelling [1,19] and modal and epistemic systems for reasoning about reachability/knowledge subject to the availability of resources [20,26].

Let Prop be a set of atomic propositions, ranged over by p. The set of all formulae of (B)BI is generated by the following grammar:

$$\phi ::= \mathbf{p} \mid \top \mid \bot \mid \mathbf{I} \mid \phi \land \phi \mid \phi \lor \phi \mid \phi \to \phi \mid \phi \* \phi \mid \phi \multimap \phi.$$

For BI, the standard connectives are interpreted intuitionistically; in BBI, classically. Negation is defined by <sup>¬</sup><sup>φ</sup> := <sup>φ</sup> → ⊥. Figure <sup>1</sup> gives Hilbert rules for the multiplicative fragment of the logics.

$$\begin{array}{c} \xi \vdash \phi \quad \eta \vdash \psi\\ \hline \xi \ast \eta \vdash \phi \ast \psi \end{array} \qquad \qquad \begin{array}{c} \eta \ast \phi \vdash \psi\\ \eta \vdash \phi \multimap \psi \end{array} \quad \begin{array}{c} \xi \vdash \phi \multimap \psi \quad \eta \vdash \phi\\ \hline \xi \ast \eta \vdash \psi \end{array}$$

**Fig. 1.** Rules for the multiplicative fragment of (B)BI.

<sup>A</sup> *BI frame* is given by a tuple <sup>X</sup> = (X, <sup>≤</sup>, ◦, E), where (X, <sup>≤</sup>) is a partial order, ◦ : <sup>X</sup><sup>2</sup> → P(X) a binary composition (where <sup>P</sup>(X) denotes the power set of <sup>X</sup>) and <sup>E</sup> <sup>⊆</sup> <sup>X</sup> a set of units for ◦. This structure must satisfy the following axioms, where the outermost universal quantification is left implicit:

$$\begin{array}{llll} \text{(Comm)} & z \in x \circ y \to z \in y \circ x & \text{(Up)} & e \in E \land e \le e' \to e' \in E\\ \text{(Unit 1)} & \exists e \in E (x \in x \circ e) & \text{(Unit 2)} & x \in y \circ e \land e \in E \to y \le x'\\ \text{(Assoc}) & t' \ge t \in x \circ y \land w \in t' \circ z \to \exists s, s', w' (s' \ge s \in y \circ z \land w \ge w' \in x \circ s'). \end{array}$$

The axioms formalize intuitive ideas about the composition of generic resources; for example, that the composition satisfies a generalized associativity that is compatible with the comparison order. This analysis is known as *resource semantics*.

A sound interpretation of BI is given by extending the standard poset semantics for propositional intuitionistic logic. This requires a *persistent* valuation: a map <sup>V</sup> : Prop → P(X) such that <sup>x</sup> ∈ V(p) and <sup>x</sup> <sup>≤</sup> <sup>y</sup> entail <sup>y</sup> ∈ V(p). We call a BI frame X together with a persistent valuation V a *Kripke BI model*. The satisfaction relation -<sup>V</sup> is given in Fig. 2. As is standard for intuitionistic logics, persistence extends to all formulae of BI. *Kripke BBI models* and their r p iff <sup>r</sup> ∈ V(p) <sup>r</sup> - <sup>r</sup> - ⊥ r <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup> iff <sup>r</sup> φ and r ψ r <sup>φ</sup> <sup>∨</sup> <sup>ψ</sup> iff <sup>r</sup> φ or r ψ r <sup>φ</sup> <sup>→</sup> <sup>ψ</sup> iff for all <sup>r</sup>- <sup>≥</sup> <sup>r</sup>, <sup>r</sup>- φ implies r- ψ; r - I iff <sup>r</sup> <sup>∈</sup> <sup>E</sup> r <sup>φ</sup> <sup>∗</sup> <sup>ψ</sup> iff there exists <sup>r</sup>- , s, t such that <sup>r</sup> <sup>≥</sup> <sup>r</sup>- <sup>∈</sup> <sup>s</sup> ◦ <sup>t</sup>, <sup>s</sup> φ and t ψ r <sup>φ</sup> −∗ <sup>ψ</sup> iff for all <sup>r</sup>- , s, t: <sup>r</sup> <sup>≤</sup> <sup>r</sup>- , <sup>t</sup> <sup>∈</sup> <sup>r</sup>- ◦ <sup>s</sup> and <sup>s</sup> φ implies t ψ

**Fig. 2.** Satisfaction for **(B)BI**. **BBI** is the case where ≤ is substituted with =.

associated semantics are given by the special case of the definitions for BI when the partial order ≤ is equality.

**Coherent Logic.** Coherent logic is the fragment of first-order logic consisting of formulae of the form <sup>A</sup>1(x)∧···∧An(x) → ∃y1B1(x, y1)∨···∨∃ymBm(x,ym), for n, m <sup>≥</sup> 0, where each <sup>A</sup><sup>i</sup> is an atomic formula involving only variables from the vector x, and each B<sup>i</sup> is the conjunction of atomic formulae involving only variables from the vectors x and yi. In a coherent formula, the variables x are implicitly universally quantified (with scope the whole formula) and both x and y<sup>i</sup> may be empty. The case <sup>n</sup> = 0 is a consequent that is always true— → <sup>∃</sup>y1B1(x, y1) ∨···∨∃ymBm(x,ym)—similarly, the case <sup>m</sup> = 0 is an antecedent that is always false: <sup>A</sup>1(x) ∧···∧ <sup>A</sup>n(x) → ⊥.

This fragment of first-order logic is sometimes referred to as *geometric logic*; however, we reserve this name for the generalization of the definition given here that permits the consequent to be an *infinite* disjunction. In turn, coherent logic generalizes—via the case <sup>m</sup> = 1 with empty y1—the *Horn clause* fragment of first-order logic utilized in logic programming and first-order theorem provers based on the resolution method.

We call a set of coherent formulae Φ a *coherent theory*. Models of coherent theories are given in a way standard for first-order logic: a *Tarskian model of* Φ is a non-empty set <sup>X</sup> together with an interpretation <sup>I</sup>, which assigns to every nary relation symbol <sup>R</sup> in the signature a set <sup>R</sup><sup>I</sup> <sup>⊆</sup> <sup>X</sup><sup>n</sup> such that for each coherent formulae in <sup>Φ</sup>, for all x <sup>∈</sup> <sup>X</sup>, the consequent <sup>∃</sup>y<sup>1</sup> <sup>∈</sup> <sup>X</sup>(BI(x, y1)) ∨···∨∃y<sup>m</sup> <sup>∈</sup> <sup>X</sup>(BI(x,ym)) is true whenever the antecedent <sup>A</sup><sup>I</sup> <sup>1</sup> (x) ∧···∧ <sup>A</sup><sup>I</sup> <sup>n</sup>(x) is true.

Many common mathematical structures are axiomatized by coherent theories. For example, algebraic structures like groups, rings, lattices, and fields, as well as total, partial, and linear orders. Further examples are found in the theory of confluence for term rewriting systems [4,48]. Of interest for our purposes, (B)BI frames are axiomatized by coherent theories. As we will see, every known separation property is given directly as a coherent axiom, with the exception of Splittability, which can be rewritten as a coherent theory.

### **3 Modular Tableaux Calculi for Separation Theories**

**The Base Tableaux Systems.** We begin with tableaux systems designed for the semantics of (B)BI as outlined in Sect. 2. As is standard for tableaux systems,

$$\langle \mathbb{T} \wedge \rangle \qquad \qquad \frac{\mathbb{T} \phi \wedge \psi : x \in \mathcal{F}}{\langle \{\mathbb{T} \phi : x, \mathbb{T} \psi : x\}, \emptyset \rangle} \qquad \qquad \langle \mathbb{F} \wedge \rangle \qquad \frac{\mathbb{F} \phi \wedge \psi : x \in \mathcal{F}}{\langle \{\mathbb{F} \phi : x\}, \emptyset \rangle \mid \langle \{\mathbb{F} \psi : x\}, \emptyset \rangle}$$

$$\begin{array}{cc} \langle \mathsf{T} \vee \rangle & \frac{\mathsf{T} \phi \vee \psi : x \in \mathcal{F}}{\langle \{\mathsf{T} \phi : x\}, \varnothing \rangle \mid \langle \{\mathsf{T} \psi : x\}, \varnothing \rangle} & \langle \mathsf{F} \vee\\ \langle \mathsf{T} \rangle & & \frac{\mathsf{T} \mathbf{1} : x \in \mathcal{F}}{\langle \varnothing, \{\,\!\!ax\} \rangle} \end{array} \qquad \begin{array}{cc} \mathsf{F} \phi \vee \psi : x \in \mathcal{F} \\ \langle \{\!\!\!F \phi : x, \varnothing \!\!\/x\!\/ : x \}, \varnothing \rangle\\ \langle \{\!\!\!\!F \/x\!\!x\} & \langle \!\!\!\!F \/x\!\!\!\/x\!\/ : x \rangle \end{array}$$

∅, {Ex}

$$\begin{array}{llll} \langle \text{Ref} \rangle & \frac{Expr(x) \in \mathcal{C} \cup \mathcal{F}}{\langle \emptyset, \{x \sim x\} \rangle} & \langle \text{Trans} \rangle & \frac{x \sim y, y \sim z \in \mathcal{C}}{\langle \emptyset, \{x \sim z\} \rangle} \\ \langle \text{Cong} \rangle & \frac{x \sim y, y \sim x, Expr(x) \in \mathcal{C}}{\langle \emptyset, \{Expr(y/x)\} \rangle} & \langle \text{Comm} \rangle & \frac{R\_{\*}xyz \in \mathcal{C}}{\langle \emptyset, \{R\_{\*}xyz\} \rangle} \\ \langle \text{Unit 1} \rangle & \frac{Expr(x) \in \mathcal{F} \cup \mathcal{C}}{\langle \emptyset, \{Eci\_{i}, R\_{\*}xcc\_{i}x\} \rangle} & \langle \text{Unit 2} \rangle & \frac{R\_{\*}xyz\_{\*}.Ey \in \mathcal{C}}{\langle \emptyset, \{x \sim z\} \rangle} \end{array}$$

with c*<sup>i</sup>* a fresh label and Expr(x) any expression in which x occurs.

#### **Fig. 3.** Shared rules for the tableaux systems.

derivations in our calculi are implicit attempts to construct a countermodel for the formula φ to be proved. This is done via the derivation of syntactic expressions that give partial specifications of a (B)BI model that can be realized as a real model if the formula is invalid. If every possible countermodel construction (i.e., every branch of a tableau) results in a contradiction, then we may conclude that no countermodel exists and call such a tableau a proof of φ.

The calculi work with two types of syntactic expression. First we have *labelled formulae* <sup>S</sup><sup>φ</sup> : <sup>x</sup>, given by a sign <sup>S</sup> ∈ {T, <sup>F</sup>} together with a (B)BI formula <sup>φ</sup> and a *label* <sup>x</sup> ∈ {c<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>}. A labelled formula states that a (B)BI formula <sup>φ</sup> is true (T) or false (F) at the state represented by the label x. The other type are called *constraints*, and encode a partial specification of the structure of a (B)BI frame. For labels x, y, z ∈ {c<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>}, a constraint is an expression of the form <sup>x</sup> <sup>∼</sup> <sup>y</sup>, <sup>R</sup>∗xyz or Ex, corresponding to the state represented by <sup>x</sup> being <sup>≤</sup> that represented by y, the state represented by z being a composition of those represented by x and y, or the state represented by x being a unit, respectively.

Unlike other bunched logic tableaux systems, we only utilize atomic labels, as opposed to a monoidal algebra of labels that encodes properties of the multiplicative connectives. New constraints are derived only by *frame expansion rules* (which directly reflect the axioms that define (B)BI frames and equality), rather than through the properties of a label algebra and a separate proof system for constraints. A *constrained set of statements* (CSS) is a pair F, C, where <sup>F</sup> is a set of labelled formulae and C is a set of constraints. It is finite if F and C are.

Informally, tableaux are trees annotated with finite CSSs. Each branch determines a CSS F, C where <sup>F</sup> (respectively <sup>C</sup>) is the union of the formula (constraint) sets that occur on the branch. Figures 3 and 4 give rules dictating the expansion of tableaux: Fig. 3 gives rules shared by both the BI and BBI systems, while Fig. 4 gives rules exclusive to each system. While ci, c<sup>j</sup> , c<sup>k</sup> denote concrete fresh labels, x, y, z etc. are *label variables*. An instance of a rule is triggered for a branch CSS when a concrete substitution instance of the premiss holds of it, and the same label substitutions carry through to the (branching) CSS(s) that the conclusion dictates are added to the tree. We now define (B)BI tableaux formally, with ⊕ giving concatenation of lists.

$$\begin{array}{llll} \langle \mathsf{T} \to \mathsf{I} \rangle & \xrightarrow{\mathsf{T}\phi \to \psi : x \in \mathcal{F} \text{ and } x \sim y \in \mathcal{C}}\\ \langle \mathsf{T} \to \langle \{\mathsf{F}\phi : y\}, \varnothing \rangle & \langle\langle \{\mathsf{T}\psi : y\}, \varnothing \rangle\rangle\\ \langle \mathsf{T} \ast \rangle & \frac{\mathsf{T}\phi \ast \psi : x \in \mathcal{F}}{\langle\langle \{\mathsf{T}\phi : c\_{i}.\mathsf{T}\psi : c\_{j}\}, \{R\_{\*}c\_{i}c\_{j}c\_{k}, c\_{k} \sim x\}\rangle} & \langle\!\{\mathsf{F}\ast\}\rangle & \frac{\mathsf{F}\phi \ast \psi : x \in \mathcal{F} \text{ and } R\_{\*}yzw, w \sim x \in \mathcal{C}}{\langle\langle \{\mathsf{F}\phi : y\}, \varnothing\rangle \rangle}\\ \langle\!\{\mathsf{T}\v : \psi : x \in \mathcal{F} \text{ and } x \sim w, R\_{\*}w\!y \in \mathcal{C}} & \langle\!\{\mathsf{F}\phi : y\}, \varnothing\rangle & \langle\!\{\mathsf{F}\psi : z\}, \varnothing\rangle\\ \langle\!\{\mathsf{T}\phi : y\}, \varnothing\rangle & \langle\!\{\{\mathsf{T}\psi : z\}, \varnothing\} & \langle\!\{\mathsf{T}\phi : z\}, \varnothing\rangle & \langle\!\{\mathsf{T}\phi : z\}, \varnothing\rangle \end{array}$$

$$\langle \text{Assoc} \rangle \xrightarrow[\langle \emptyset \rangle]{\text{Assoc}} \frac{t \sim t', R\_\* x y t, R\_\* t' z w \in \mathcal{C}}{\langle \emptyset, \{c\_i \sim c\_j, c\_k \sim w, R\_\* y z c\_i, R\_\* x c\_j c\_k\} \rangle} \quad \langle \text{Up} \rangle \qquad \qquad \frac{Ex, x \sim y \in \mathcal{C}}{\langle \emptyset, \{Ey\} \rangle}$$

**Logical expansion rules for BBI** T¬ <sup>T</sup>¬<sup>φ</sup> : <sup>x</sup> ∈ F {F<sup>φ</sup> : <sup>x</sup>}, ∅ F¬ <sup>F</sup>¬<sup>φ</sup> : <sup>x</sup> ∈ F {T<sup>φ</sup> : <sup>x</sup>}, ∅ <sup>T</sup> → <sup>T</sup><sup>φ</sup> <sup>→</sup> <sup>ψ</sup> : <sup>x</sup> ∈ F {F<sup>φ</sup> : <sup>x</sup>}, ∅ | {T<sup>ψ</sup> : <sup>x</sup>}, ∅ <sup>F</sup> → <sup>F</sup><sup>φ</sup> <sup>→</sup> <sup>ψ</sup> : <sup>x</sup> ∈ F {T<sup>φ</sup> : x, <sup>F</sup><sup>ψ</sup> : <sup>x</sup>}, ∅ T∗ <sup>T</sup><sup>φ</sup> <sup>∗</sup> <sup>ψ</sup> : <sup>x</sup> ∈ F {T<sup>φ</sup> : <sup>c</sup>*i*, <sup>T</sup><sup>ψ</sup> : <sup>c</sup>*j*}, {R∗c*i*c*j*x} F∗ <sup>F</sup><sup>φ</sup> <sup>∗</sup> <sup>ψ</sup> : <sup>x</sup> ∈ F and <sup>R</sup>∗yzx ∈ C {F<sup>φ</sup> : <sup>y</sup>}, ∅ | {F<sup>ψ</sup> : <sup>z</sup>}, ∅ T−∗ <sup>T</sup><sup>φ</sup> −∗ <sup>ψ</sup> : <sup>x</sup> ∈ F and <sup>R</sup>∗xyz ∈ C {F<sup>φ</sup> : <sup>y</sup>}, ∅ | {T<sup>ψ</sup> : <sup>z</sup>}, ∅ F−∗ <sup>F</sup><sup>φ</sup> −∗ <sup>ψ</sup> : <sup>x</sup> ∈ F {T<sup>φ</sup> : <sup>c</sup>*i*, <sup>F</sup><sup>ψ</sup> : <sup>c</sup>*j*}, {R∗xc*i*c*j*}

$$\langle \text{Assoc} \rangle \qquad \qquad \qquad \frac{R\_\* x y t, R\_\* t z w \in \mathcal{C}}{\langle \emptyset, \{R\_\* y z c\_i, R\_\* x c\_i w\} \rangle} \qquad \qquad \qquad \qquad \langle \text{Sym} \rangle \qquad \qquad \qquad \frac{x \sim y \in \mathcal{C}}{\langle \emptyset, \{y \sim x\} \rangle}$$

with c*i*, c*<sup>j</sup>* , c*<sup>k</sup>* fresh labels, Expr(x) any expression in which x occurs.

**Fig. 4.** Tableaux rules for (B)BI

**Definition 1 (Tableau).** *<sup>A</sup>* (B)BI tableau *for a finite CSS* F0, <sup>C</sup>0 *is a list of CSSs, called* branches*, built inductively according to the following rules:*


$$\frac{Prremiss}{\langle \mathcal{F}\_1, \mathcal{C}\_1 \rangle \mid \dots \mid \langle \mathcal{F}\_k, \mathcal{C}\_k \rangle}$$

*is a (B)BI expansion rule from Figs. 3 or 4 for which a concrete instance of* P remiss *is fulfilled by* F, C*, then the list* <sup>T</sup><sup>m</sup> <sup>⊕</sup> [F ∪ F1, C∪C1; ... ;F ∪ <sup>F</sup>k, C∪Ck] ⊕ T<sup>n</sup> *is a tableau for* F0, <sup>C</sup>0*.*

*<sup>A</sup>* (B)BI tableau for <sup>φ</sup> *is a (B)BI tableau for* {F<sup>φ</sup> : <sup>c</sup>0}, ∅*.*


**Fig. 5.** Separation properties.

**Definition 2 (Closed Tableau/Proof).** *A CSS* F, C *is* closed *if one of the following* closure conditions *holds: (1)* <sup>T</sup><sup>φ</sup> : <sup>x</sup> ∈ F*,* <sup>F</sup><sup>φ</sup> : <sup>y</sup> ∈ F *and* <sup>x</sup> <sup>∼</sup> <sup>y</sup> ∈ C*; (2)* <sup>F</sup> : <sup>x</sup> ∈ F*; (3)* <sup>T</sup><sup>⊥</sup> : <sup>x</sup> ∈ F*; (4)* <sup>F</sup>I : <sup>x</sup> ∈ F *and* Ex ∈ C*. A CSS is* open *iff it is not closed. A tableau is closed iff all its branches are closed. A* proof *for a formula* <sup>φ</sup> *is a closed tableau for* <sup>φ</sup>*.*

We note that we could simply add T¬,F¬, and Sym to the BI system and obtain one for BBI. However, this causes a significant amount of redundancy in the production of labels and constraints while requiring many more derivation steps in proofs, something that does not arise with the BBI rules given.

**Extension with Separation Theories.** A *separation property* is a first-order axiom in the language of (B)BI Kripke frames. Figure 5 gives separation properties taken from across the Separation Logic literature [10,13,14,24], presented as coherent formulae. A *separation theory* is thus a collection Σ of axioms from Fig. 5. The syntactic form of coherent formulae enables a uniform translation of separation properties into tableaux expansion rules and closure conditions. First, each first-order atomic formula is translated into constraints: T r(<sup>z</sup> <sup>∈</sup> <sup>x</sup> ◦ <sup>y</sup>) = <sup>R</sup>∗xyz, T r(<sup>x</sup> <sup>∈</sup> <sup>E</sup>) = Ex, T r(<sup>x</sup> <sup>≤</sup> <sup>y</sup>) = <sup>x</sup> <sup>∼</sup> <sup>y</sup> and T r(<sup>x</sup> <sup>=</sup> <sup>x</sup> ) = <sup>x</sup> <sup>∼</sup> <sup>x</sup> , x <sup>∼</sup> <sup>x</sup>. Given <sup>A</sup>1(x) ∧···∧ <sup>A</sup>n(x) → ∃y1B1(x, y1) ∨···∨∃ymBm(x,ym) with n, m = 0, we obtain the frame expansion rule

$$\frac{Tr(A\_1(\vec{x})), \dots, Tr(A\_n(\vec{x})) \in \mathcal{C}}{\langle \emptyset, \mathcal{C}\_1 \rangle \mid \dots \mid \langle \emptyset, \mathcal{C}\_m \rangle},$$

where each <sup>C</sup><sup>i</sup> is the set of constraints translated from the conjuncts of <sup>B</sup>i, using fresh labels c<sup>i</sup> in place of the previously quantified yi. For example, the separation properties Cross-Split and Non-Branching are translated to the rules

$$\frac{R\_{\bullet}tux, R\_{\bullet}vux \in \mathcal{C}}{\langle \emptyset, \{R\_{\bullet}c\_{i}c\_{j}t, R\_{\bullet}c\_{k}c u, R\_{\bullet}c\_{i}c\_{k}v, R\_{\bullet}c\_{j}c\_{i}w\} \rangle} \quad \text{and} \quad \frac{x \sim y, x \sim y' \in \mathcal{C}}{\langle \emptyset, \{y \sim y'\} \rangle \mid \langle \emptyset, \{y' \sim y\} \rangle},$$

where ci, c<sup>j</sup> , ck, c<sup>l</sup> are fresh labels. The special case n = 0 gives a rule with premiss Expr1(x1), . . . , Exprp(xp) ∈ F∪C, where each Expri(xi) is *any* expression in which x<sup>i</sup> occurs and the x<sup>i</sup> are the universally quantified variables in the original formula. The case m = 0 gives a new closure condition consisting of the conjunction of constraints translated from the antecedent of the original formula.

Note that the property Splittability is defined by a *system* of coherent axioms. These axioms force the new predicate E to be interpreted as the complement of E. When translated into tableaux rules, <sup>x</sup> <sup>∈</sup> <sup>E</sup> gives a new constraint Ex.

Given a separation theory Σ, a *(B)BI +* Σ-*tableau/proof* is defined in the same way as Definitions 1 and 2, except that a tableau can also be expanded by translated Σ-rules, and any new closure properties obtained from Σ can factor into the closure of a tableau and thus into proofs.

We give an example of a tableau proof in Fig. 6. The formula (¬I −∗ ⊥) → I is valid in BBI models satisfying Total, but not in all BBI models [35], and Fig. 6—written, for clarity, using the traditional representation of tableaux and using ⊗ to denote closed branches—shows that the tableaux system for BBI + Total proves it. The left-hand branch is closed because both FI : c0, TI : c<sup>0</sup> and <sup>c</sup><sup>0</sup> <sup>∼</sup> <sup>c</sup><sup>0</sup> occur, while the right is closed because <sup>T</sup><sup>⊥</sup> : <sup>c</sup><sup>1</sup> occurs.

### **4 Applications to Separation Logics**

A *separation logic* can be determined by an assertion logic to describe machine state—a theory of (B)BI generated by validity in a concrete model of (B)BI + Σ for some separation theory Σ—and a specification logic to describe changes to machine state following program execution—typically a logic of Hoare triples {φ}C{ψ}, where <sup>φ</sup> and <sup>ψ</sup> are formulas of the assertion language and <sup>C</sup> is a program in some programming language. Soundness of the *frame rule*,

$$\frac{\{\phi\} \, C \, \{\psi\}}{\{\phi \ast \chi\} \, C \, \{\psi \ast \chi\}},$$

where χ does not include any free variables modified by the program C, witnesses the coherence of these different aspects, and facilitates Separation Logic's characteristic 'local reasoning', which allows conclusions about a program's effect on the global state to be derived from reasoning on just the resource it accesses.

$$\begin{array}{llll} \text{(1)} & \{\{\text{F}\{\neg\text{I}\rightarrow\bot\}\rightarrow\text{I}\;:\,c\},\emptyset & \text{Premiss} \\ \text{(2)} & \{\{\text{T}\neg\text{I}\rightarrow\bot\}\;:\,c\_{0},\text{FI}:\,c\_{0}\},\emptyset & \{\text{F}\rightarrow\},\text{from} \\ \text{(3)} & \{\emptyset,\{\text{R}\_{\text{e}}\alpha\_{0}c\_{0}\}\} & \text{Total, from} \end{array} \qquad \begin{array}{llll} \text{Premiss} \\ \text{(\text{F}\rightarrow\text{)},\text{ from} (1)} \\ \text{(\text{I}\rightarrow\text{)},\text{ from} (1)} \\ \text{(\text{I}\rightarrow\text{I}\;:\,c\_{0}\},\emptyset & \{\text{T}\bot:c\_{1}\},\emptyset \\ \text{(\text{I}\rightarrow\text{I}\;:\,c\_{0}\},\emptyset & \{\text{T}\neg\},\text{ from} (2),\text{ (3)} \\ \text{(\text{G}\\_{\text{I}}\alpha\_{0}\frown c\_{0})\} & \text{(\text{R}\rightarrow\text{)},\text{ from} (4)} \\ \text{@} & \end{array}$$

**Fig. 6.** Tableau proof of (¬I −∗ ⊥) → I in the BBI + Total system.

To demonstrate the wide applicability of our framework we now give a number of separation logics that are models of separation theories. We note that our systems can be incomplete with respect to a given concrete model, but this is as expected for any proof system: the benefit versus a standard (B)BI system which will be incomplete with respect to the class of models of a given separation theory—is the capability to make inferences based on the additional structure the model carries. Because of space constraints this selection is demonstrative rather than exhaustive. Other examples include Petri nets [13]; step-indexed models for storable locks [11] and the Iris framework [33]; separation logics incorporating named [42] and fractional [7] permissions; and separation logics designed for message passing [49] and amortized resource analysis [3].

**Heaps.** Our first example is given by the standard memory models of Separation Logic [32]. A *heap* is a partial function <sup>h</sup> : <sup>N</sup> <sup>→</sup> <sup>Z</sup>, representing an allocation of memory addresses to values. Given heaps h, h , <sup>h</sup>#h denotes that dom(h) <sup>∩</sup> dom(h ) = <sup>∅</sup>; <sup>h</sup> · <sup>h</sup> denotes the union of functions with disjoint domains, which is defined iff h#h . The *empty heap*, [], is defined nowhere.

Let <sup>H</sup> denote the set of all heaps. Then HeapBBI = (H, ·, {[]}) is a BBI frame. Letting <sup>h</sup> <sup>h</sup> denote that <sup>h</sup> extends <sup>h</sup>, HeapBI = (H, , ·, H) defines a BI frame. These frames generate the standard classical and intuitionistic models of Separation Logic. HeapBBI satisfies Partial Determinism, Cancellativity, Single Unit, Indivisible Units, Cross-Split and Unit Self Joining; HeapBI additionally satisfies Splittability, Upwards-Closed, Downwards-Closed, Increasing and Normal Increasing while dropping Single Unit and Unit Self Joining.

One property distinguishing the standard memory models is that ∗ elimination—φ∗<sup>ψ</sup> <sup>→</sup> <sup>ψ</sup>, useful for reasoning about garbage-collected languages is valid in the intuitionistic heap model but not the classical. Cao et al. [14] show that this corresponds to the separation property Increasing. Figure 7—written with a traditional tableau presentation—shows a single branch tableaux proof of <sup>φ</sup> <sup>∗</sup> <sup>ψ</sup> <sup>→</sup> <sup>ψ</sup> for BI + Increasing, closed because <sup>T</sup><sup>ψ</sup> : <sup>c</sup>4, <sup>F</sup><sup>ψ</sup> : <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>4</sup> <sup>∼</sup> <sup>c</sup><sup>1</sup> occur.

**Permissions.** Permissions are incorporated into variants of separation logics that are designed to reason about certain kinds of concurrent algorithms and more fine-grained notions of memory disjointness: for example, disjointness modulo shared read permission. H´ou [29] reports a schema of Clouston that encompasses many such models: we recall it, with two concrete instances.

Let <sup>V</sup> be a set of values and : <sup>V</sup> <sup>2</sup> <sup>→</sup> <sup>V</sup> an associative and commutative partial function. Denote by <sup>H</sup><sup>V</sup> the set of V-valued heaps <sup>h</sup> : <sup>N</sup> <sup>→</sup> <sup>V</sup> . Then Heap<sup>V</sup> = (H<sup>V</sup> , ◦-, {[]}) is a BBI frame, where ◦is defined by

$$h\_1 \circ\_\star h\_2(n) = \begin{cases} h\_1(n) \star h\_2(n) & \text{if } n \in dom(h\_1) \cap dom(h\_2) \text{ and } h\_1(n) \star h\_2(n) \downarrow \\ h\_1(n) & \text{if } n \in dom(h\_1) \backslash dom(h\_2) \\ h\_2(n) & \text{if } n \in dom(h\_2) \backslash dom(h\_1) \\ \text{undefined} & \text{otherwise.} \end{cases}$$

$$\langle \stackrel{\circ}{1} \rangle \qquad \qquad , \qquad , \qquad \langle \{ \stackrel{\circ}{1} \stackrel{\circ}{\phi} \* \stackrel{\circ}{\iota} \stackrel{\circ}{\iota} \stackrel{\circ}{\iota} \* \stackrel{\circ}{\iota} \stackrel{\circ}{\iota} \rangle, \emptyset \rangle$$

$$\begin{array}{llll} \text{(2)} & \{\{\mathbb{T}\phi\*\psi:c\_{1},\mathbb{F}\psi:c\_{1}\},\{c\_{0}\sim c\_{1}\}\} & \text{(\mathbb{F}\rightarrow\text{)},\text{ from} \\ \text{(3)} & \{\{\mathbb{T}\phi:c\_{3},\mathbb{T}\psi:c\_{4}\},\{R\_{\*}c\_{3}c\_{4}c\_{2},\,c\_{2}\sim c\_{1}\}\} & \text{(\mathbb{F}\ast\text{)},\text{ from} \\ \text{(4)} & \{\emptyset,\{c\_{4}\sim c\_{2}\}\} & \text{Increasing}, \end{array}$$

⊗

$$
\langle \hat{\mathfrak{k}} \rangle \qquad \qquad \qquad \qquad \dot{\langle \emptyset \rangle} , \{ c\_4 \sim c\_1 \hat{\mathfrak{k}} \}
$$

Premiss <sup>F</sup> →, from (1) T∗, from (2) Increasing, from (3) Trans, from (2), (3)

**Fig. 7.** Tableau proof of φ ∗ ψ → ψ in the BI + Increasing system.

H´ou defines Bornat et al.'s [6] *counting permissions model* with V = Z<sup>2</sup> and

$$(x,i)\star(y,j) = \begin{cases} (x,i+j) & \text{if } x=y, i<0 \text{ and } j<0\\ (x,i+j) & \text{if } x=y, i+j \ge 0 \text{ and } (i<0 \text{ or } j<0) \\ \text{undefined} & \text{otherwise.} \end{cases}$$

This frame satisfies Partial Determinism, Cancellativity, Indivisible Units, Single Unit, Cross-Split and Unit Self Joining.

H´ou defines Dockins et al.'s [24] *binary tree model* by considering the set T of non-empty binary trees with leaves labelled or ⊥ that are quotiented by the smallest congruence that identifies any subtree in which all leaves have the same label with a single leaf carrying that label. Then <sup>V</sup> <sup>=</sup> <sup>Z</sup> <sup>×</sup> <sup>T</sup>, and is defined, where ∨ (∧) denotes pointwise disjunction (conjunction) of equivalent trees, by

$$((x, [t]) \star (y, [t']) = \begin{cases} (x, [t \vee t']) & \text{if } x = y \text{ and } [t \wedge t'] = [\bot] \\ \text{undefined} & \text{otherwise.} \end{cases}$$

This frame satisfies Partial Determinism, Cancellativity, Single Unit, Indivisible Units, Disjointness, Splittability, Cross-Split and Unit Self Joining.

**Crash Hoare Logic.** Chen et al. [16] use a separation logic to verify that the FSCQ file system meets its specification and secures its data under any sequence of crashes. Cao et. al. [14] give the underlying model as the following BI frame. Let V <sup>+</sup> be the set of non-empty lists over a set V and the empty list. Buffer heaps are defined to be heaps <sup>h</sup> : <sup>N</sup> <sup>→</sup> <sup>V</sup> <sup>+</sup>. Let <sup>H</sup>buff be the set of all buffer heaps. Then Heapbuff = (Hbuff, <sup>≤</sup>, ·, {[]}) is a BI frame, where · is the usual heap composition, and <sup>h</sup><sup>1</sup> <sup>≤</sup> <sup>h</sup><sup>2</sup> iff dom(h1) = dom(h2) and <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>N</sup>, <sup>∃</sup><sup>l</sup> <sup>∈</sup> <sup>V</sup> <sup>+</sup> ∪{ } such that <sup>h</sup>1(x) = <sup>l</sup> <sup>⊕</sup> <sup>h</sup>2(x). This frame satisfies Partial Determinism, Cancellativity, Single Unit, Indivisible Units, Cross-Split, Upwards-Closed, Downwards-Closed, Always-Joins, Non-Branching, Unit Self Joining, and Normal Increasing.

**Typed Heaps.** Cao et al. [14] give an example derived from the handling of multibyte locks in Appel's [2] Verified System Toolchain separation logic for CompCert C. Let a *typed heap* be a partial map <sup>h</sup> : <sup>N</sup> → {char,short1,short2} such that h(n) = short<sup>1</sup> implies h(n+ 1) = short2. Let Htyp denote the set of all typed heaps. Then HeapTyp = (Htyp, <sup>≤</sup>, ◦, Htyp) is a BI frame, where <sup>h</sup><sup>1</sup> <sup>≤</sup> <sup>h</sup><sup>2</sup> iff, for all <sup>n</sup> <sup>∈</sup> dom(h1) either <sup>n</sup> <sup>∈</sup> dom(h2) and <sup>h</sup>1(n) = <sup>h</sup>2(n) or <sup>h</sup>1(n) = char, and <sup>h</sup> <sup>∈</sup> <sup>h</sup><sup>1</sup> ◦h<sup>2</sup> iff <sup>h</sup><sup>1</sup> ·h<sup>2</sup> <sup>≤</sup> <sup>h</sup>. This frame satisfies Indivisible Units, Disjointness, Splittability, Cross-Split, Upwards-Closed, Downwards-Closed, Non-Branching, Increasing, and Normal Increasing.

### **5 Metatheory**

**Tableaux Systems as Coherent Theories.** Just as coherent formulae yield tableaux rules, tableaux rules yield coherent formulae, allowing a complete specification of our calculi as coherent theories. Our framework determines a firstorder signature: for each formula φ of (B)BI, we have unary relation symbols Tφ and Fφ, together with the unary relation symbol E, the binary relation symbol <sup>∼</sup> and the ternary relation symbol <sup>R</sup>∗.

Given a rule premiss 'S<sup>φ</sup> : <sup>x</sup> ∈ F and <sup>A</sup>1x<sup>1</sup> <sup>1</sup> ...x<sup>1</sup> <sup>k</sup><sup>1</sup> ,...,Amx<sup>m</sup> <sup>1</sup> ...x<sup>m</sup> <sup>k</sup><sup>m</sup> ∈ C' we obtain the coherent antecedent <sup>C</sup>(x) <sup>≡</sup> <sup>S</sup>φ(x) <sup>∧</sup> <sup>i</sup> <sup>A</sup>ix<sup>i</sup> <sup>1</sup> ...x<sup>i</sup> ki . For the <sup>j</sup> <sup>−</sup> th conclusion F<sup>j</sup> , <sup>C</sup><sup>j</sup> of the rule we obtain <sup>∃</sup>yjC<sup>j</sup> (x, y<sup>j</sup> ), where <sup>C</sup><sup>j</sup> is the conjunction of atomic formulae translated from the constraints in F<sup>j</sup> ∪ C<sup>j</sup> , with any fresh labels c that occurred substituted with y<sup>j</sup> . The translated rule is thus <sup>C</sup>(x) <sup>→</sup> <sup>∃</sup>y1C1(x, y1)∨···∨∃ynCn(x, yn). For example, the instance of the BI rule F−∗ for <sup>φ</sup>−∗<sup>ψ</sup> becomes <sup>F</sup>φ−∗ψ(x) → ∃y1, y2, y3(Tφ(y2)∧Fψ(y3)∧x∼y<sup>1</sup> <sup>∧</sup>R∗y1y2y3).

There are some special cases to pay attention to. For tableaux rules with premiss Expr(x) ∈ F∪C the antecedent of the translated coherent formula is . This is not the case for rules with premiss Expr(x) ∈ C: these must be translated into a separate rule for each of the finitely many ways x can occur in each constraint. Finally, each closure condition 'S1φ<sup>1</sup> : x1,..., Snφ<sup>n</sup> : xn, A1y<sup>1</sup> <sup>1</sup> ...y<sup>1</sup> <sup>k</sup><sup>1</sup> ,..., and <sup>A</sup>my<sup>m</sup> <sup>1</sup> ...y<sup>m</sup> <sup>k</sup>m' gives <sup>i</sup> <sup>S</sup>iφi(xi) <sup>∧</sup> <sup>i</sup> <sup>A</sup>iy<sup>i</sup> <sup>1</sup> ...y<sup>i</sup> <sup>k</sup><sup>i</sup> → ⊥.

Given a (B)BI formula φ, the finite coherent theory Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> is given by the translated (B)BI + Σ-frame expansion rules, the translated closure conditions and the instances of translated logical expansion rules for subformulae of φ. We note that we could specify the whole tableaux system for (B)BI + Σ as an infinite coherent theory (similar to the axiomatization of a Hintikka set in standard tableaux completeness proofs), but finiteness is required for our argument.

**Soundness and Completeness.** We now prove soundness and completeness of the tableaux method via an analogous result for the Tarskian semantics of coherent logic. First, we show that the existence of a Kripke (B)BI + Σ-model with a state that doesn't satisfy φ is equivalent to the existence of a Tarskian model of Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Fφ(x)}.

**Definition 3 (Induced Kripke Model of** <sup>M</sup>**).** *Given a Tarskian model* <sup>M</sup> *of* Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> *, define* [a] = {<sup>b</sup> <sup>|</sup> <sup>a</sup> <sup>∼</sup><sup>I</sup> b, b <sup>∼</sup><sup>I</sup> <sup>a</sup>} *and* <sup>X</sup><sup>M</sup> <sup>=</sup> {[a] <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>X</sup>}*. Then* [a] <sup>≤</sup><sup>M</sup> [b] *iff* <sup>a</sup> <sup>∼</sup><sup>I</sup> <sup>b</sup>*,* [c] <sup>∈</sup> [a] ◦<sup>M</sup> [b] *iff* <sup>R</sup><sup>I</sup> <sup>∗</sup> abc*, and* <sup>E</sup><sup>M</sup> <sup>=</sup> {[a] <sup>|</sup> <sup>E</sup>Ia}*.* <sup>V</sup>M(p) = {[a] | ∃b(<sup>b</sup> <sup>∼</sup><sup>I</sup> <sup>a</sup> *and* <sup>T</sup>pI(b))}*.*


The induced Kripke frame is a well-defined structure because of the frame tableaux rules, with [−] forming equivalence classes and <sup>≤</sup>M, ◦M, and <sup>E</sup><sup>M</sup> independent from the choice of representatives due to Cong. The (B)BI <sup>+</sup> <sup>Σ</sup>frame properties for the induced frame follow from their correspondent rules in the tableaux and the valuation V<sup>M</sup> is independent of choice of representative and persistent for induced Kripke BI + Σ-models.

**Lemma 1.** *Given a Tarskian model* <sup>M</sup> *of* <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> *, the induced Kripke model* <sup>X</sup><sup>M</sup> *is a Kripke* (B)BI <sup>+</sup> <sup>Σ</sup>*-model.*

The significance of this model is that satisfiability of subformulae ψ of φ is determined by the interpretation of the relation symbols Sψ in the original Tarskian model. A simple proof by induction yields the next lemma.

**Lemma 2.** *Let* <sup>M</sup> *be a Tarskian model of the coherent theory* <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> *,* <sup>ψ</sup> *<sup>a</sup> subformula of* <sup>φ</sup> *and* <sup>a</sup> <sup>∈</sup> <sup>X</sup>*. 1. If* <sup>T</sup>ψI(a) *holds in* <sup>M</sup>*, then* [a] -<sup>V</sup>*<sup>M</sup>* <sup>ψ</sup>*; 2. If* <sup>F</sup>ψI(a) *holds in* <sup>M</sup>*, then* [a] -<sup>V</sup>*<sup>M</sup>* <sup>ψ</sup>*.*

We can also induce Tarskian models from Kripke models. Let (<sup>X</sup> , <sup>V</sup>) be a Kripke (B)BI + Σ-model. We define the induced Tarskian model by taking X to be the carrier, and defining the interpretation <sup>I</sup> by <sup>∼</sup><sup>I</sup> <sup>=</sup> <sup>≤</sup>, <sup>R</sup><sup>I</sup> <sup>∗</sup> <sup>=</sup> {(a, b, c) <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>a</sup> ◦ <sup>b</sup>}, <sup>E</sup><sup>I</sup> <sup>=</sup> <sup>E</sup>, <sup>T</sup>ψ<sup>I</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> -<sup>V</sup> <sup>ψ</sup>} and <sup>F</sup>ψ<sup>I</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> -<sup>V</sup> <sup>ψ</sup>}.

**Lemma 3.** *Every Kripke* (B)BI+Σ*-model* (<sup>X</sup> , <sup>V</sup>) *with a state* <sup>x</sup> *(not) satisfying* φ *induces a model of* Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Tφ(x)} *(*Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Fφ(x)}*).*

We now connect the existence of a closed tableaux to Bezem and Coquand's [4] *breadth-first forward reasoning* proof system for coherent logic. In their system, judgments of the form X <sup>Φ</sup> D are derived, where X is a set of atomic first-order sentences, Φ a finite coherent theory and D a *closed coherent disjunction*; a first-order sentence with the same syntactic shape as the consequent of a coherent formula. The derivation of the judgment X <sup>Φ</sup> D is defined inductively:


A derivation can be seen as a kind of tableau, branching at each stage by adding every possible consequence of Φ obtainable from the atomic first-order sentences at the current node. A semi-decidable procedure is given to systematically search for a derivation of X <sup>Φ</sup> D. First check the base case. If it doesn't hold, apply the inductive step to any Φ-axioms fireable from X. If there are none, X forms an Herbrand countermodel of Φ against D. If the inductive step can be applied, apply the search procedure recursively to all premisses. Bezem and Coquand show that successful termination corresponds to Tarskian truth.

**Theorem 1 (**[4]**).** X <sup>Φ</sup> D *is derivable iff the search procedure successfully terminates for* <sup>X</sup> <sup>Φ</sup> <sup>D</sup> *iff* <sup>D</sup> *is true in all Tarskian models of* <sup>X</sup> <sup>∪</sup> <sup>Φ</sup>*.*

It is straightforward that the search procedure for {Fφ(a)} Φ(B)BI+<sup>Σ</sup> <sup>φ</sup> ⊥ corresponds precisely to an exhaustive search for a closed tableau for φ.

**Lemma 4.** *There exists a closed* (B)BI + Σ*-tableaux for* φ *iff the search procedure for* {Fφ(a)} <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> ⊥ *successfully terminates.*

Hence if a closed (B)BI + Σ-tableaux does not exist for φ, there exists a Tarskian model <sup>M</sup> of <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Fφ(x)}. By Lemma 2, the induced Kripke model <sup>X</sup><sup>M</sup> has a state [a] such that [a] -<sup>V</sup>*<sup>M</sup>* <sup>φ</sup>, establishing that <sup>φ</sup> fails to be valid for Kripke (B)BI + Σ-models. Conversely, if a closed tableaux does exist, then there is no Tarskian model of <sup>M</sup> of <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Fφ(x)}. By Lemma 3, φ is valid in Kripke (B)BI + Σ-models, as otherwise any countermodel would generate a Tarskian model <sup>M</sup> of <sup>Φ</sup>(B)BI+<sup>Σ</sup> <sup>φ</sup> ∪ {∃x.Fφ(x)}, a contradiction.

**Theorem 2 (Soundness and Completeness for** (B)BI+Σ*-***Tableaux).** φ *is valid in Kripke* (B)BI +Σ*-models iff* φ *is provable in the* (B)BI +Σ*-tableaux system.*

### **6 Conclusions and Further Work**

We have given a framework of tableaux systems that exhaustively captures the breadth of separation theories in the literature. Our framework is proven sound and complete parametrically by a novel representation of tableaux systems as coherent theories that allows us to apply existing theory from coherent logic. This resolves the expressivity gap between the logics of bunched implications and the separation logics defined upon them, and provides proof theory for the assertion languages of a wide array of program logics.

The completeness of tableaux systems is usually proved by defining a notion of a *Hintikka set*: a saturated set of (labelled) formulae (and possibly constraints) that specifies a term model of the logic. The existence of a Hintikka set is then shown to follow from non-existence of a tableau proof. Our method is a generalization of this idea, implemented parametrically by choice of tableaux system. While we have focused on Separation Logic, this technique is adaptable to virtually any logic interpreted on relational structures, including the breadth of bunched and modal logics. This suggests the significance of the coherent logic fragment extends beyond the generation of proof rules for frame conditions.

The implementation of our systems is of principal importance for future work. Our tableaux representation suggests existing coherent logic provers (see [43] for a survey) may already be suitable, though tactics designed specifically for tableaux coherent theories may have to be developed to make this efficient. A closely related goal is the development of parametric Separation Logic implementations that utilize our systems as assertion language provers. Finally, our results suggest interesting theoretical work. Coherent logic has close connections to topos theory, and Caramello [15] has developed techniques to transfer results between mathematical fields via bridges between the classifying topoi of coherent theories. We wish to investigate if any results of logical interest can be found in this way by utilizing the representation of tableaux as coherent theories.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Differential Calculus with Imprecise Input and Its Logical Framework**

Abbas Edalat<sup>1</sup> and Mehrdad Maleki2(B)

<sup>1</sup> Department of Computing, Imperial College London, London SW7 2RH, UK a.edalat@imperial.ac.uk

<sup>2</sup> Institute for Research in Fundamental Sciences (IPM), Niavaran, Tehran, Iran m.maleki@ipm.ir

**Abstract.** We develop a domain-theoretic Differential Calculus for locally Lipschitz functions on finite dimensional real spaces with imprecise input/output. The inputs to these functions are hyper-rectangles and the outputs are compact real intervals. This extends the domain of application of Interval Analysis and exact arithmetic to the derivative. A new notion of a tie for these functions is introduced, which in one dimension represents a modification of the notion previously used in the one-dimensional framework. A Scott continuous sub-differential for these functions is then constructed, which satisfies a weaker form of calculus compared to that of the Clarke sub-gradient. We then adopt a Program Logic viewpoint using the equivalence of the category of stably locally compact spaces with that of semi-strong proximity lattices. We show that given a localic approximable mapping representing a locally Lipschitz map with imprecise input/output, a localic approximable mapping for its sub-differential can be constructed, which provides a logical formulation of the sub-differential operator.

**Keywords:** Imprecise input/output · Interval analysis Exact computation · Lipschitz maps · Clarke gradient Domain theory · Stone duality

### **1 Introduction**

A well-known hurdle in numerical computation is caused by accumulation of round-off errors in floating point arithmetic, which can create havoc and lead to catastrophic errors in compound calculations. In safety and critical systems, where reliability of numerical computation is of utmost importance, one way to avoid the pitfalls of floating point arithmetic is to use interval analysis or exact arithmetic. In both interval analysis and exact arithmetic as well as in computable analysis, a real number is represented by a nested shrinking sequence of compact intervals whose intersections is the real number. Similarly, a real nvector can be represented by a nested sequence of hyper-rectangles in R<sup>n</sup>. This leads to a framework in numerical computation and a framework for computational geometry where the inputs of algorithms or programmes are imprecise real numbers or real n-vectors; see for example [3,5,6,9,10,14,15,17,21–23,27].

All frameworks for interval analysis and exact real computation are based on functions whose input and output are real intervals. When we compose two such functions, the output of the first function serves as the input to the second function. An implementation of these frameworks in a functional programming language follows this same pattern; see for example the lazy Haskell implementation of IC-Reals for Exact Real Computation [1], which uses linear fractional transformations as developed in [14,22].

An important feature of working with a calculus consisting of functions with interval or imprecise input/output is that even when we deal with elementary functions such as polynomials we cannot restrict ourselves to their canonical (maximal) extensions to intervals [21]. These canonical extensions take a compact interval to its forward image under the function. In fact, these extensions are not closed under, for example, multiplication. Thus, the real-valued map of a real variable x -<sup>→</sup> x<sup>2</sup> when implemented with interval input by <sup>x</sup> -<sup>→</sup> x <sup>×</sup> x, using multiplication of two copies of the input interval, is not the canonical extension of the quadratic map of real numbers: it evaluates for example [−1, 1]<sup>2</sup> to [−1, 1] rather than [0, 1], which is what the canonical extension of the quadratic map evaluates to. In general, we need to work with any Scott continuous map of type **<sup>I</sup>**<sup>R</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> or, in higher dimension, of type **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**R, where **<sup>I</sup>**R<sup>n</sup> denotes the domain of hyper-rectangles of R<sup>n</sup>.

In the past 60 years, interval analysis has grown as a distinct interdisciplinary subject to impact on nearly all areas of mathematical and numerical analysis including computer arithmetic, linear algebra, integration, solution of initial value problems and partial differential equations to correct solutions in mathematical optimisation and robotics; see [20]. It is natural to ask if the domain of application of interval analysis and exact computation can be extended to the derivative of functions, i.e., whether one can take a kind of derivative of a map which takes a compact interval or a compact hyper-rectangle as input.

In [11], the notion of a domain-theoretic sub-differentiation of maps which have non-empty and compact intervals as inputs and outputs was introduced. The restriction of these maps to real numbers turns out to be locally Lipschitz maps of type R → R and the sub-differential restricted to real numbers has been shown to be the same as the Clarke sub-gradient [8]. A major problem, however, is that the framework in [11], which only deals with one-dimensional maps of type **<sup>I</sup>**<sup>R</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> is not accompanied with a Stone duality framework and thus, even in dimension one, cannot be used in order to handle program logic and predicate transformers.

In [7], a typed lambda calculus in the framework of an extension of Real PCF [6,17,22] was introduced in which in particular continuously differentiable and more generally Lipschitz functions can be defined. Given an expression representing a real-valued function of a real variable in this language, one is able to evaluate the expression on an argument, representing an interval, but also evaluate the generalised derivative, i.e., the L-derivative, equivalently the Clarke gradient, of the expression on an interval. The operational semantics of the language, which is equipped with min and a weighed average, enjoys adequacy and a definability result proving that any computable Lipschitz map is definable in it. The denotational semantics is based on domain theory which in principle allows a program logic formulation of the computation, although this challenge has not been taken up yet.

In [13], a point free framework for sub-differentiation of real-valued locally Lipschitz functions on finite dimensional Euclidean spaces has been developed which provides a Stone duality for the Clarke gradient and thus enables a program logic view of differentiation. However, the induced logical framework cannot be employed for the class of functions with imprecise input/output used in exact computation since, as already pointed out, this class necessarily contains general extensions of real-valued locally Lipschitz maps of finite dimensional Euclidean spaces.

In this paper, we formulate a new notion of a tie of functions with imprecise input/output, which, in one dimension, represents a modification of the corresponding notion in [12]. This allows us to develop a Scott continuous subdifferential for functions with hyper-rectangles in R<sup>n</sup> as inputs and compact intervals in R as output, which are used in exact computation. We show that a weaker calculus compared to that for the Clarke sub-gradient is satisfied in this interval framework. In addition we construct a logical framework for subdifferentiation of locally Lipschitz maps of type **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**R. The basic Stone duality results developed in [13] are then extended to sub-differentiation of such interval maps.

#### **1.1 Background**

We assume the reader is familiar with basic elements of topology and domain theory. Following the definition in [18], by a domain we mean a continuous dcpo (directed complete partial order). All the domains we use in this paper are bounded complete as well. By **C**(R<sup>n</sup>), we denote the domain of non-empty convex and compact subsets of R<sup>n</sup> ordered with reverse inclusion and augmented with <sup>⊥</sup> <sup>=</sup> <sup>R</sup><sup>n</sup> as the bottom element. If <sup>C</sup><sup>1</sup>, C<sup>2</sup> <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>) then the way-below relation is given by <sup>C</sup><sup>1</sup> <sup>C</sup><sup>2</sup> iff <sup>C</sup>◦ <sup>1</sup> <sup>⊃</sup> <sup>C</sup><sup>2</sup>, where <sup>S</sup>◦ is the interior of the set <sup>S</sup>. By **<sup>I</sup>**R<sup>n</sup>, we denote the sub-domain of non-empty compact hyper-rectangles with faces parallel to coordinate hyper-planes of <sup>R</sup><sup>n</sup>. The Euclidean norm of <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> is denoted by x.

The lattice of open subsets of a topological space X is denoted by Ω(X). The Scott topology of a domain D is, however, written as σ<sup>D</sup>. The closure of S <sup>⊂</sup> X is denoted by <sup>S</sup>. The upper topology, equivalently the Scott-topology, of **C**(R<sup>n</sup>) has a basis of the form

$$
\Box O = \{ C \in \mathbf{C}(\mathbb{R}^n) : C \subset O \},
$$

where O belongs to a basis of open and convex subsets of <sup>R</sup><sup>n</sup>.

Given an open set a <sup>⊂</sup> X of a topological space and an element b <sup>∈</sup> D of a domain <sup>D</sup>, the single-step function bχ<sup>a</sup> : <sup>X</sup> <sup>→</sup> <sup>D</sup> is defined by bχa(x) = <sup>b</sup> if x <sup>∈</sup> a and <sup>⊥</sup> otherwise. A non-empty compact real interval x is written as x = [x−, x<sup>+</sup>]. For a map <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> of topological spaces, <sup>f</sup>[S] denotes the image of the set S <sup>⊂</sup> X.

The three operations of addition of two vectors, scalar multiplication of a vector and a real number, and the inner product of two vectors can be extended to **C**(Rn) to obtain the following three Scott continuous maps:

$$\begin{array}{l} \text{(i) } -+-: \mathbf{C}(\mathbb{R}^{n}) \times \mathbf{C}(\mathbb{R}^{n}) \to \mathbf{C}(\mathbb{R}^{n}) \text{ with } A + B = \{a + b : a \in A, b \in B\}, \\\text{(ii) } - \times-: \mathbb{R} \times \mathbf{C}(\mathbb{R}^{n}) \to \mathbf{C}(\mathbb{R}^{n}) \text{ with } rA = \{rx : x \in A\}, \text{ and}, \\\text{(iii) } - - : \mathbf{C}(\mathbb{R}^{n}) \times \mathbf{C}(\mathbb{R}^{n}) \to \mathbb{I}\mathbb{R} \text{ with } A \cdot B = \{a \cdot b : a \in A, b \in B\}. \end{array}$$

These three operations have well-defined restrictions to **I**R<sup>n</sup>. In addition, in this paper, we will consider their higher order extension to sets of sets. For example, if <sup>a</sup>1, a<sup>2</sup> <sup>∈</sup> <sup>Ω</sup>(R) are open subsets, then a1, <sup>a</sup><sup>2</sup> <sup>∈</sup> <sup>σ</sup>**<sup>C</sup>**(R*n*) and we have:

$$(\square a\_1) \cdot (\square a\_2) := \{x\_1 \cdot x\_2 : x\_1 \in \square a\_1, x\_2 \in \square a\_2\}$$

Moreover:

**Proposition 1.** *(i) The modal operator* - : <sup>Ω</sup>(R<sup>n</sup>) <sup>→</sup> <sup>σ</sup>**<sup>C</sup>**(R*n*) *preserves meets, i.e.,* -<sup>O</sup><sup>1</sup> <sup>∧</sup> -<sup>O</sup><sup>2</sup> <sup>=</sup> -(O<sup>1</sup> <sup>∧</sup> <sup>O</sup><sup>2</sup>) *for all* <sup>O</sup>1, O<sup>2</sup> <sup>∈</sup> <sup>Ω</sup>(R<sup>n</sup>)*.*


Next, we present the notion of Clarke's sub-gradient [4]. Recall that a map <sup>f</sup> : <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, where U is an open set, is locally Lipschitz if all points in U have an open neighbourhood O <sup>⊂</sup> U with a constant k <sup>≥</sup> 0 such that <sup>|</sup>f(x) <sup>−</sup> f(y)| ≤ kx <sup>−</sup> y for all x, y <sup>∈</sup> O. The generalized directional derivative of a locally Lipschitz f at x in the direction of v is defined as follow:

$$f^\diamond(x;v) = \limsup\_{y \to x \ t \to 0^+} \frac{f(y+tv) - f(y)}{t}$$

The Clarke subgradient of f at x, denoted by ∂f(x) is a convex and compact subset of R<sup>n</sup> and is defined by:

$$\partial f(x) = \{w \in \mathbb{R}^n : f^\diamond(x; v) \ge w \cdot v \text{ for all } v \in \mathbb{R}^n\}\tag{1}$$

The sub-gradient function ∂f : <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>) is upper continuous, equivalently Scott continuous. Moreover, the Clarke sub-gradient satisfies a weak calculus. For locally Lipschitz maps f,g : U <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>,


(iii) Chain rule: For f,g : <sup>R</sup> <sup>→</sup> <sup>R</sup>, ∂f(g(x)) · ∂g(x) <sup>⊇</sup> ∂(f ◦ g)(x).

The notion of the L-derivative, equivalent to the Clarke sub-gradient, for real-valued functions on finite dimensional Euclidean spaces has the following ingredients [8]. A function f : U <sup>⊂</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> has a non-empty generalized Lipschitz constant b <sup>∈</sup> **<sup>C</sup>**(Rn) in a non-empty convex open set <sup>a</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> if for all x, y <sup>∈</sup> a we have f(x) <sup>−</sup> f(y) <sup>∈</sup> b · (x <sup>−</sup> y). The collection of all functions that have generalized Lipschitz constant b in a is denoted by δ(a, b), called the tie of <sup>a</sup> with <sup>b</sup>. The collection of all single-step functions bχ<sup>a</sup> with <sup>a</sup> <sup>⊂</sup> <sup>U</sup> and f <sup>∈</sup> δ(a, b) is bounded in (U <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>)) and thus the L-derivative of <sup>f</sup> defined as

$$\mathcal{L}f = \sup \{ b\chi\_a : f \in \delta(a,b) \}$$

is Scott-continuous function. Moreover, we have <sup>L</sup>f <sup>=</sup> ∂f.

#### **1.2 Stably Locally Compact Space and Semi-strong Proximity Lattice**

We recall that in geometric logic one uses the open sets of a topological space as propositions or semi-decidable properties [25,26]. If X is a topological space and Ω(X) its lattice of open sets, a propositional geometric theory is constructed as follows: For every open set <sup>a</sup> <sup>∈</sup> <sup>Ω</sup>(X), define a proposition <sup>P</sup><sup>a</sup>, i.e., every open set of X provides a property or predicate. For open sets a and b with a <sup>⊆</sup> b stipulate: (i) <sup>P</sup><sup>a</sup> <sup>P</sup><sup>b</sup>. For a family of open sets <sup>S</sup>, stipulate: (ii) <sup>P</sup><sup>∪</sup><sup>S</sup> - <sup>a</sup>∈<sup>S</sup> <sup>P</sup><sup>a</sup>. For a finite family of open sets S, stipulate: (iii) <sup>a</sup>∈<sup>S</sup> <sup>P</sup><sup>a</sup> <sup>P</sup><sup>∩</sup><sup>S</sup>. The converses of (ii) and (iii) follow from (i). The nullary disjunction in (ii) is interpreted as **false** and the nullary conjunction in the converse of (iii) is interpreted as **truth**, i.e., <sup>P</sup><sup>∅</sup> **false** and <sup>P</sup><sup>X</sup> **truth**.

We regard <sup>x</sup> <sup>∈</sup> <sup>X</sup> as a model of the theory in which <sup>P</sup><sup>a</sup> is interpreted as **true** iff x <sup>∈</sup> a, i.e., x <sup>|</sup><sup>=</sup> a iff x <sup>∈</sup> a, or, a point is a model of a proposition if it is in the open set representing the proposition. It is possible that different points give rise to the same model, i.e., satisfy the same open sets, and it is also possible that a model does not arise by points in X in this way. For so-called sober spaces, as we will define below, we do have a one-to-one correspondence between points and models.

A topological space X is called *stably locally compact* [2,18] if it is sober, locally compact and if the intersection of two compact saturated sets is compact. Recall that X is sober if its points are in bijection with the completely prime filters of its lattice of open sets. (A set is saturated if it is the intersection of its open neighbourhoods.) Equivalently, X is stably locally compact if and only if its lattice of open sets is a distributive continuous lattice which is also arithmetic, i.e., its way-below relation satisfies:

$$O \ll O\_1, O\_2 \Rightarrow O \ll O\_1 \land O\_2$$

The spaces R<sup>n</sup>, **I**R<sup>n</sup> and **C**(R<sup>n</sup>) are all stably locally compact spaces. The waybelow relation for <sup>Ω</sup>(R<sup>n</sup>) is given by <sup>O</sup><sup>1</sup> <sup>O</sup><sup>2</sup> iff <sup>O</sup><sup>1</sup> is compact and <sup>O</sup><sup>1</sup> <sup>⊂</sup> <sup>O</sup><sup>2</sup>, whereas the way-below relation in **C**(Rn), and thus **I**Rn, is given by Proposition 1. We can obtain a finitary representation of these spaces by a sub-lattice of open sets as we will now describe.

<sup>A</sup> *semi-strong proximity lattice* [13] consists of a tuple (B;∨,∧, <sup>0</sup>, 1; <sup>≺</sup>) in which (B;∨,∧, <sup>0</sup>, 1) is a distributive lattice such that <sup>≺</sup> is a binary relation on B with <sup>≺</sup>=≺◦≺ satisfying:

1. <sup>∀</sup><sup>a</sup> <sup>∈</sup> B M <sup>⊂</sup><sup>f</sup> B.M <sup>≺</sup> <sup>a</sup> <sup>⇔</sup> - M <sup>≺</sup> a. 2. <sup>∀</sup>a <sup>∈</sup> B. a = 1 <sup>⇒</sup> a <sup>≺</sup> 1. 3. <sup>∀</sup>a, a1, a<sup>2</sup> <sup>∈</sup> B. a <sup>≺</sup> <sup>a</sup>1, a<sup>2</sup> <sup>⇔</sup> <sup>a</sup> <sup>≺</sup> <sup>a</sup><sup>1</sup> <sup>∧</sup> <sup>a</sup><sup>2</sup>. 4. <sup>∀</sup>a, x, y <sup>∈</sup> B. a <sup>≺</sup> x <sup>∨</sup> y <sup>⇒</sup> ∃x , y <sup>∈</sup> B. x <sup>≺</sup> x & y <sup>≺</sup> y & a <sup>≺</sup> x <sup>∨</sup> y .

Here, <sup>M</sup> <sup>⊂</sup><sup>f</sup> <sup>B</sup> means that <sup>M</sup> is a finite (possibly empty) subset of <sup>B</sup>, and M <sup>≺</sup> a means that <sup>∀</sup>m <sup>∈</sup> M.m <sup>≺</sup> a.

The relation <sup>R</sup> <sup>⊆</sup> <sup>B</sup><sup>1</sup> <sup>×</sup> <sup>B</sup><sup>2</sup>, between two semi-strong proximity lattice, is a *localic approximable mapping* if it satisfies:


The identity approximable mapping on <sup>B</sup> is <sup>≺</sup><sup>B</sup> and composition of approximable mappings is the usual composition of the relations in the same order as for functions.

Let **SL-Compact** denote the category of all stably locally compact spaces and continuous functions and let **Semi-Strong PL** denote the category of semistrong proximity lattice and approximable mappings. The following functors between these categories establish an equivalence between them [13,19].

# A : **SL-Compact** <sup>→</sup> **Semi-Strong PL** G : **Semi-Strong PL** <sup>→</sup> **SL-Compact**

Given a stably locally compact space X, fix a basis B of its topology which is closed under finite intersections and let A(X) be the semi-strong proximity lattice based on <sup>B</sup>. Given a continuous function <sup>f</sup> : <sup>X</sup><sup>1</sup> <sup>→</sup> <sup>X</sup><sup>2</sup> between two stably locally compact spaces, we have a localic approximable mapping <sup>A</sup><sup>f</sup> : <sup>A</sup>(X<sup>1</sup>) <sup>→</sup> <sup>A</sup>(X<sup>2</sup>) given by a A<sup>f</sup> <sup>b</sup> iff <sup>a</sup> <sup>f</sup> <sup>−</sup><sup>1</sup>(b).

Given a semi-strong proximity lattice B, the spectrum spec(B) of B is the set of all prime filters of <sup>B</sup>. For <sup>x</sup> <sup>∈</sup> <sup>B</sup> let <sup>O</sup><sup>x</sup> <sup>=</sup> {<sup>F</sup> <sup>∈</sup> spec(B) : <sup>x</sup> <sup>∈</sup> <sup>F</sup>}. The collection of <sup>O</sup>x's, <sup>x</sup> <sup>∈</sup> <sup>B</sup>, is a base of a topology over spec(B). Put,

$$G(B) = \operatorname{spec}(B)$$

Given a localic approximable mapping <sup>R</sup> : <sup>B</sup><sup>1</sup> <sup>→</sup> <sup>B</sup><sup>2</sup> define,

$$G\_R: \mathsf{spec}(\mathsf{B}\_1) \to \mathsf{spec}(\mathsf{B}\_2)$$

by <sup>G</sup>R(F) = {b<sup>2</sup> <sup>∈</sup> <sup>B</sup><sup>2</sup> : <sup>∃</sup>b<sup>1</sup> <sup>∈</sup> F.b<sup>1</sup> R b<sup>2</sup>}. We have, <sup>A</sup>G*<sup>R</sup>* <sup>=</sup> <sup>R</sup> and <sup>G</sup>A*<sup>f</sup>* <sup>=</sup> <sup>f</sup>. Thus, the category of semi-strong proximity lattice with approximable mappings is equivalent to the category of stably locally compact spaces and continuous functions [13].

We now construct some canonical bases of **C**(R<sup>n</sup>) and **I**R<sup>n</sup>, which provide us with the semi-strong proximity lattices these spaces can be represented by. Let B<sup>0</sup> <sup>R</sup>*<sup>n</sup>* , respectively <sup>B</sup><sup>0</sup> <sup>U</sup> , for <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup>, be any basis of <sup>R</sup><sup>n</sup>, respectively <sup>U</sup>, that consists of bounded convex open sets and is closed under finite intersections. We let <sup>B</sup><sup>R</sup>*<sup>n</sup>* , respectively <sup>B</sup><sup>U</sup> , denote the semi-strong proximity lattice generated by B0 <sup>R</sup>, respectively B<sup>0</sup> <sup>U</sup> . This means that every element of <sup>B</sup><sup>R</sup>, respectively <sup>B</sup><sup>U</sup> , is a finite join of elements of B<sup>0</sup> <sup>R</sup>*<sup>n</sup>* , respectively <sup>B</sup><sup>0</sup> <sup>U</sup> [13].

It now follows, by Proposition 1, that B<sup>0</sup> **<sup>C</sup>**(R*n*) = {a : a <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup>* } is a basis of the Scott topology <sup>σ</sup>**<sup>C</sup>**(R*n*), which is closed under finite intersections. Let <sup>B</sup>**<sup>C</sup>**(R*n*) be the semi-strong proximity lattice generated by B<sup>0</sup> **<sup>C</sup>**(R*n*). Thus, each element of the semi-strong proximity lattice <sup>B</sup>**<sup>C</sup>**(R*n*) is the finite join of elements of <sup>B</sup><sup>0</sup> **<sup>C</sup>**(R*n*).

Finally, let <sup>T</sup> (U) be a basis of <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> consisting of open hyper-rectangles in U with faces parallel to the coordinate planes and let <sup>T</sup> := <sup>T</sup> (R<sup>n</sup>). Then B0 **<sup>I</sup>**R*<sup>n</sup>* = {a : a ∈ T} is a basis for σ**<sup>I</sup>**R*<sup>n</sup>* . By using <sup>T</sup> (U), we similarly obtain a basis B<sup>0</sup> **<sup>I</sup>**<sup>U</sup> for **<sup>I</sup>**<sup>U</sup> <sup>⊂</sup> **<sup>I</sup>**R<sup>n</sup>. Again by Proposition 1(i) these bases are closed under finite intersections. We let <sup>B</sup>**<sup>I</sup>**R*<sup>n</sup>* , respectively, <sup>B</sup>**<sup>I</sup>**<sup>U</sup> be the semi-strong proximity lattices generated by B<sup>0</sup> **<sup>I</sup>**R*<sup>n</sup>* , respectively, <sup>B</sup><sup>0</sup> **<sup>I</sup>**<sup>U</sup> . Thus, each element of <sup>B</sup>**<sup>I</sup>**R*<sup>n</sup>* , respectively, <sup>B</sup>**<sup>I</sup>**<sup>U</sup> , is the finite join of elements of <sup>B</sup><sup>0</sup> **<sup>I</sup>**R*<sup>n</sup>* , respectively, <sup>B</sup><sup>0</sup> **<sup>I</sup>**<sup>U</sup> .

The functors A and G thus provide a bijection between the two hom-sets:

$$(\mathbf{I}U \to \mathbf{II}\mathbb{R}) \stackrel{\stackrel{\mathcal{G}}{\underset{\mathbf{A}}{\rightleftharpoons}}}{\underset{\mathbf{A}}{(\mathbf{I}U}}(B\_{\mathbf{I}U} \to B\_{\mathbf{I}\mathbb{R}})$$

and between the two hom-sets:

$$(\mathbf{I}U \to \mathbf{C}(\mathbb{R}^n)) \overset{\mathcal{G}}{\underset{\mathcal{A}}{\overset{\mathcal{G}}{\rightleftharpoons}}} (B\_{\mathbf{I}U} \to B\_{\mathbf{C}(\mathbb{R}^n)}),$$

These bijections are used later to deduce our Stone duality results.

#### **1.3 Related Work**

Differentiation in logical form for functions of type U <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> was introduced in [13]. These maps were represented by localic approximable mappings of type <sup>B</sup><sup>U</sup> <sup>→</sup> <sup>B</sup><sup>R</sup>, and the localic approximable mapping of the L-derivative of these functions have the type <sup>B</sup><sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>C</sup>**(R*n*). The strong tie of <sup>a</sup> with <sup>b</sup>, denoted by δ<sup>s</sup>(a, b), was defined as the collection of all functions f : a <sup>⊆</sup> U <sup>→</sup> <sup>R</sup> such that there exists a <sup>∈</sup> <sup>B</sup><sup>0</sup> <sup>R</sup> and <sup>b</sup> <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>) with <sup>a</sup> <sup>a</sup> , b b and f <sup>∈</sup> δ(a , b ).

The approximable mappings <sup>R</sup> : <sup>B</sup><sup>U</sup> <sup>→</sup> <sup>B</sup><sup>R</sup> has Lipschitz constant <sup>O</sup> <sup>∈</sup> <sup>B</sup>**C**(R*n*) in <sup>a</sup> <sup>∈</sup> <sup>B</sup><sup>U</sup> , denoted by <sup>R</sup> <sup>∈</sup> <sup>Δ</sup>(a, O), if we have:

$$\begin{aligned} \forall a\_1, a\_2 \prec a, &(a\_1, a\_2) \in \mathsf{Sep}, \exists a'\_1, a'\_2 \in B\_{\mathbb{R}}. \\\ a\_1 \, R \, a'\_1, a\_2 \, R \, a'\_2, &a'\_1 - a'\_2 \prec O \cdot (a\_1 - a\_2) \end{aligned}$$

where the separation predicate Sep <sup>⊂</sup> <sup>B</sup><sup>U</sup> <sup>×</sup> <sup>B</sup><sup>U</sup> means (a<sup>1</sup>, a<sup>2</sup>) <sup>∈</sup> Sep if there exists a 1, a <sup>2</sup> such that <sup>a</sup><sup>1</sup> <sup>≺</sup> <sup>a</sup> <sup>1</sup>, a<sup>2</sup> <sup>≺</sup> <sup>a</sup> <sup>2</sup> and <sup>a</sup> <sup>1</sup>∧a <sup>2</sup> = 0. The strong knot <sup>Δ</sup>s(a, O) is defined as the set of approximable mappings <sup>R</sup> : <sup>B</sup><sup>U</sup> <sup>→</sup> <sup>B</sup><sup>R</sup> such that there exists <sup>a</sup> <sup>∈</sup> <sup>B</sup><sup>U</sup> , <sup>O</sup> <sup>∈</sup> <sup>B</sup>**<sup>C</sup>**(R*n*) with <sup>a</sup> <sup>≺</sup> <sup>a</sup> , O <sup>≺</sup> O and R <sup>∈</sup> Δ(a , O ).

The strong ties and strong knots are dual to each others, i.e., R <sup>∈</sup> Δ<sup>s</sup>(a, O) iff <sup>G</sup><sup>R</sup> <sup>∈</sup> <sup>δ</sup><sup>s</sup>(a, <sup>O</sup>). The Lipschitzian derivative of <sup>R</sup> : <sup>B</sup><sup>U</sup> <sup>→</sup> <sup>B</sup><sup>R</sup> is defined as the approximable mapping

$$\mathsf{L}(R) = \sup \{ A\_{\overline{O}\_{\lambda^a}} \, : \, R \in \Delta\_s(a, O) \} $$

It turns out that <sup>L</sup>(R) = <sup>A</sup><sup>L</sup>G*<sup>R</sup>* and we have a weak calculus which matches that for the Clarke sub-gradient stated after Eq. (1), i.e., <sup>L</sup>(R<sup>1</sup>)+L(R<sup>2</sup>) <sup>⊆</sup> <sup>L</sup>(R<sup>1</sup>+R<sup>2</sup>) and <sup>R</sup><sup>1</sup> · <sup>L</sup>(R<sup>2</sup>) + <sup>R</sup><sup>2</sup> · <sup>L</sup>(R<sup>1</sup>) <sup>⊆</sup> <sup>L</sup>(R<sup>1</sup> · <sup>R</sup><sup>2</sup>), and if at least one of <sup>R</sup><sup>1</sup> and <sup>R</sup><sup>2</sup> is a continuously differentiable approximable mapping then equality holds. A weak form of the chain rule also holds for composition of approximable mappings corresponding to that for the Clarke sub-gradient.

### **2 L-derivative with Imprecise Inputs**

We start by defining a notion of tie for Scott continuous map of type f : **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**R, for an open convex subset U <sup>⊂</sup> <sup>R</sup><sup>n</sup>. From now on, in the rest of the paper, we assume f : **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**<sup>R</sup> is Scott-continuous.

**Definition 1.** *Let* f : **<sup>I</sup>**U <sup>⊆</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *where* <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> *is an open set, be Scott continuous and* a ∈ T (U)*, an open hyper-rectangle in* U*, and* b <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>)*. We say* f *has a* generalized Lipschitz constant b *in* a *and write* δ(a, b) *if we have:*

$$\forall x, y \in \Box a, x \cap y = \emptyset. \, f(x) - f(y) \subseteq b \cdot (x - y)^{\circ}$$

In the one dimensional case, this new notion is a modification of that in [12] as we in Definition 1, require the hyper-rectangles x and y to be disjoint, i.e., inconsistent in **<sup>I</sup>**U. Thus, the condition for membership of a tie is weaker. We will need this weaker condition in order to develop the Stone duality result later in the paper.

We show that despite this weaker notion, if f <sup>∈</sup> δ(a, b) with b <sup>=</sup> <sup>⊥</sup>, then f preserves maximal elements and its restriction to maximal elements gives a Lipschitz map. In other words f is the extension of a classical Lipschitz function in **<sup>I</sup>**a.

**Proposition 2.** *Let* f <sup>∈</sup> δ(a, b)*, where* a <sup>⊂</sup> <sup>R</sup><sup>n</sup> *is a open hyper-rectangle and* b <sup>∈</sup> **<sup>C</sup>**(Rn) \ {⊥}*, then for each* <sup>x</sup> <sup>∈</sup> <sup>a</sup>*,* <sup>f</sup>({x}) <sup>∈</sup> **<sup>I</sup>**<sup>R</sup> *is maximal and the induced function* <sup>ˆ</sup>f : a <sup>⊂</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *is Lipschitz and satisfies:*

$$\forall x\_1, x\_2 \in a. \left( b \cdot (x\_1 - x\_2) \right)^- \le \hat{f}(x\_1) - \hat{f}(x\_2) \le \left( b \cdot (x\_1 - x\_2) \right)^+ \tag{2}$$

$$\forall x\_1, x\_2 \in a. \left| \hat{f}(x\_1) - \hat{f}(x\_2) \right| \le \|b\| \|x\_1 - x\_2\|,\tag{3}$$

*where* b = max{L|L <sup>∈</sup> b}*.*

**Corollary 1.** *If* f <sup>∈</sup> δ(a, b) *then* <sup>ˆ</sup>f <sup>∈</sup> δ(a, b)*.*

**Definition 2.** *We say a Scott continuous function of type* **<sup>I</sup>**<sup>U</sup> <sup>⊂</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *is* locally Lipschitz *in* a*, for* a ∈ T (U)*, if it belongs to a tie* δ(a, b) *with* b <sup>=</sup> <sup>⊥</sup>*.*

Given a continuous function f : U <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, its maximal extension to a Scott continuous function **<sup>I</sup>**U <sup>⊆</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> is denoted by **<sup>I</sup>**<sup>f</sup> with **<sup>I</sup>**f(x) = <sup>f</sup>[x] for x <sup>∈</sup> **<sup>I</sup>**U when x <sup>=</sup> <sup>⊥</sup> and **<sup>I</sup>**f(⊥) = <sup>⊥</sup>.

**Corollary 2.** f <sup>∈</sup> δ(a, b) *iff* **<sup>I</sup>**f <sup>∈</sup> δ(a, b)*.*

If (A, ) is a dcpo then the consistency predicate Con(A, ) and Con(A,) for a finite subset {a<sup>i</sup> : <sup>i</sup> <sup>∈</sup> <sup>I</sup>} with respect to and are defined as follow:

$$\mathsf{Con}\_{(A,\underline{\square})}\{a\_i:i\in I\} \Longleftrightarrow \exists a\in A, \forall i\in I. a\_i \sqsubseteq a\_i$$

and

$$\mathsf{Con}\_{(A,\ll)}\{a\_i:i\in I\} \Longleftrightarrow \exists a\in A, \forall i\in I. a\_i \ll a\_i$$

For the collection (b<sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* )<sup>i</sup>∈<sup>I</sup> or (b<sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* )<sup>i</sup>∈<sup>I</sup> for finite indexing set <sup>I</sup> where <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>Ω</sup>(R<sup>n</sup>) are open hyper-rectangles and <sup>b</sup><sup>i</sup> <sup>∈</sup> (D, ), the function space consistency predicate Con<sup>R</sup>*n*→<sup>D</sup> or Con**I**R*n*→<sup>D</sup> is defined as follows:

$$\mathsf{Con}\_{\mathbb{R}^n \to D}(b\_i \chi\_{a\_i})\_{i \in I} \iff \forall J \subseteq I. \left[\mathsf{Con}\_{\left(\Omega(\mathbb{R}^n), \mathfrak{P}\right)}\{a\_i : i \in J\} \Rightarrow \mathsf{Con}\_{\left(D, \underline{\subseteq}\right)}\{b\_i : i \in J\}\right]$$

Con**I**R*n*→D(biχ<sup>a</sup>*<sup>i</sup>* )i∈<sup>I</sup> ⇐⇒ ∀J ⊆ I. [Con(Ω(**I**R*n*),){a<sup>i</sup> : i ∈ J} ⇒ Con(D,){b<sup>i</sup> : i ∈ J}].

It follows that the supremum sup<sup>i</sup>∈<sup>I</sup> <sup>b</sup><sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* exists iff Con<sup>R</sup>*n*→<sup>D</sup>(b<sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* )<sup>i</sup>∈<sup>I</sup> and sup<sup>i</sup>∈<sup>I</sup> <sup>b</sup><sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* exists iff Con**I**R*n*→<sup>D</sup>(b<sup>i</sup>χ<sup>a</sup>*<sup>i</sup>* )<sup>i</sup>∈<sup>I</sup> .

**Proposition 3.** *For any indexing set* J *the family of step functions* (b<sup>j</sup>χ<sup>a</sup>*<sup>j</sup>* )<sup>j</sup>∈<sup>J</sup> *is consistent if* <sup>j</sup>∈<sup>J</sup> <sup>δ</sup>(<sup>a</sup><sup>j</sup> , b<sup>j</sup> ) <sup>=</sup> <sup>∅</sup>*.*

*Proof.* Suppose f <sup>∈</sup> <sup>j</sup>∈<sup>J</sup> <sup>δ</sup>(<sup>a</sup><sup>j</sup> , b<sup>j</sup> ) then <sup>ˆ</sup><sup>f</sup> <sup>∈</sup> <sup>j</sup>∈<sup>J</sup> <sup>δ</sup>(a<sup>j</sup> , b<sup>j</sup> ), and hence (b<sup>j</sup>χ<sup>a</sup>*<sup>j</sup>* )<sup>j</sup>∈<sup>J</sup> is consistent, which implies (b<sup>j</sup>χ<sup>a</sup>*<sup>j</sup>* )<sup>j</sup>∈<sup>J</sup> is consistent.

Recall that a crescent in R<sup>n</sup> is the intersection of a closed and an open set. Given two points p, q <sup>∈</sup> <sup>R</sup><sup>n</sup>, we denote the closed, respectively open, line segment between them by [p, q] = {λp + (1 <sup>−</sup> λ)q : 0 <sup>≤</sup> λ <sup>≤</sup> <sup>1</sup>}, respectively (p, q) = {λp + (1 <sup>−</sup> λ)q : 0 <λ< <sup>1</sup>}.

**Proposition 4.** *We have* δ(a, b) <sup>⊇</sup> <sup>j</sup>∈<sup>J</sup> <sup>δ</sup>(<sup>a</sup><sup>j</sup> , b<sup>j</sup> ) *if* bχ<sup>a</sup> supj∈<sup>J</sup> <sup>b</sup>jχa*j .*

*Proof.* Let <sup>g</sup> := supj∈<sup>J</sup> <sup>b</sup>jχ<sup>a</sup>*<sup>j</sup>* . Suppose bχ<sup>a</sup> supj∈<sup>J</sup> <sup>b</sup>jχ<sup>a</sup>*<sup>j</sup>* , then a <sup>⊂</sup> <sup>j</sup>∈<sup>J</sup> <sup>a</sup><sup>j</sup> and thus <sup>a</sup> <sup>⊂</sup> <sup>j</sup>∈<sup>J</sup> <sup>a</sup><sup>j</sup> . In addition, by considering the restriction of <sup>g</sup> to the maximal elements of **<sup>I</sup>**Rn, we find that <sup>a</sup> is partitioned by the open sets <sup>a</sup><sup>j</sup> , <sup>j</sup> <sup>∈</sup> <sup>J</sup>, into a finite number of disjoint crescents <sup>c</sup>i, <sup>i</sup> <sup>∈</sup> <sup>I</sup>, with

$$g(\{r\}) = \sup\_{c\_i \subset a\_j} b\_j \sqsubseteq b\_i$$

for <sup>r</sup> <sup>∈</sup> <sup>c</sup><sup>i</sup>. Let f <sup>∈</sup> <sup>j</sup>∈<sup>J</sup> <sup>δ</sup>(<sup>a</sup><sup>j</sup> , b<sup>j</sup> ). We show that <sup>f</sup> <sup>∈</sup> <sup>δ</sup>(a, b). Suppose we have two hyper-rectangles x, y <sup>∈</sup> a with x <sup>∩</sup> y <sup>=</sup> <sup>∅</sup>. Let the points p <sup>∈</sup> x and q <sup>∈</sup> y be such that p <sup>−</sup> q is the minimum distance between x and y. Then [p, q] is partitioned by the crescents c<sup>i</sup>, <sup>i</sup> <sup>∈</sup> <sup>I</sup>, into a finite number of one-dimensional intervals such that the one-dimensional interior of each is contained in <sup>c</sup><sup>i</sup> for some <sup>i</sup> <sup>∈</sup> <sup>I</sup>. Let <sup>r</sup>0, r1,...,r<sup>k</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> be the boundary points of these intervals ordered from p to q. Then, using the continuity of <sup>ˆ</sup>f, we have:

$$f(\{r\_t\}) - f(\{r\_{t+1}\}) \subseteq \sup\_{\{r\_t, r\_{t+1}\} \subseteq c\_j} b\_j \cdot (\{r\_t\} - \{r\_{t+1}\}) \subseteq b \cdot (\{r\_t\} - \{r\_{t+1}\})$$

for 0 <sup>≤</sup> t <sup>≤</sup> n <sup>−</sup> 1. Since x <sup>∈</sup> a, there exists j <sup>∈</sup> J with x <sup>∈</sup> <sup>a</sup><sup>j</sup> . Moreover, <sup>x</sup> <sup>⊆</sup> <sup>a</sup><sup>j</sup> iff <sup>r</sup><sup>0</sup> <sup>∈</sup> <sup>a</sup><sup>j</sup> . Similarly, <sup>y</sup> <sup>⊆</sup> <sup>a</sup><sup>j</sup> iff <sup>r</sup><sup>k</sup> <sup>∈</sup> <sup>a</sup><sup>j</sup> . From these relations, we obtain:

$$f(x) - f(\{r\_0\}) \subseteq \sup\_{x \subseteq a\_j} b\_j \cdot (x - \{r\_0\}), \quad f(\{r\_k\}) - f(y) \subseteq \sup\_{y \subseteq a\_j} b\_j \cdot (\{r\_k\} - y)$$

Thus,

$$\begin{aligned} f(x) - f(y) &= f(x) - f(\{r\_0\}) + f(\{r\_0\}) - \dots - f(\{r\_k\}) + f(\{r\_k\}) - f(y) \\ &\subseteq b \cdot \left( x - \left( \sum\_{t=0}^{k-1} f(\{r\_t\}) - f(\{r\_t\}) \right) - y \right) = b \cdot (x - y) \blacksquare \end{aligned}$$

**Definition 3.** *The derivative of a Scott continuous map* f : **<sup>I</sup>**U <sup>⊂</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *is the map:*

$$\mathcal{L}f = \sup\_{f \in \delta(\square a,b)} b\chi\_{\square a} : \mathbf{II}U \to \mathbf{C}(\mathbb{R}^n),$$

*where* <sup>U</sup> *is a convex open subset of* <sup>R</sup><sup>n</sup>*.*

**Theorem 1.** *(i)* <sup>L</sup>f *is well-defined and Scott continuous. (ii)* f <sup>∈</sup> δ(a, b) *iff* bχ<sup>a</sup> Lf*.*


If f : U <sup>⊆</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> is a locally Lipschitz map, then the Clarke sub-gradient <sup>L</sup>f : U <sup>→</sup> **<sup>C</sup>**(Rn) extends, by Scott's extension theory for densely injective spaces [24], to a Scott continuous map **<sup>I</sup>**(Lf) : **<sup>I</sup>**U <sup>→</sup> **<sup>C</sup>**(Rn). We then have:

#### **Proposition 5.**

$$\mathcal{L}(\mathbf{I}f) = \mathbf{I}(\mathcal{L}f)$$

*Proof.* This follows from the relation:

$$f \in \delta(a, b) \iff \mathbf{I}f \in \delta(\Box a, b),$$

for all <sup>a</sup> <sup>∈</sup> <sup>Ω</sup>(U) and <sup>b</sup> <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>).

The following example shows that in the context of the L-derivative of interval functions, Clarke's weak calculus no longer holds for Sum.

*Example 1.* Let f,g : **<sup>I</sup>**<sup>R</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> defined by f(x) = x and g(x) = <sup>−</sup>x, then <sup>L</sup>f(x) = {1} and <sup>L</sup>g(x) = {−1} and thus <sup>L</sup>f(x) + <sup>L</sup>g(x) = {0}. On the other hand, (f <sup>+</sup> g)(x) = f(x) + g(x) = x <sup>−</sup> x and it follows that f <sup>+</sup> g /<sup>∈</sup> δ(a, {0}), for any open set a <sup>⊂</sup> <sup>R</sup>, and consequently <sup>L</sup>(f <sup>+</sup> <sup>g</sup>) <sup>=</sup> {0}. Hence, <sup>L</sup>(<sup>f</sup> <sup>+</sup> <sup>g</sup>)(x) - <sup>L</sup>f(x) + <sup>L</sup>g(x).

We say an interval [r<sup>−</sup>, r<sup>+</sup>] is *positive* , respectively *negative*, if r<sup>−</sup> > 0, respectively r<sup>+</sup> <sup>&</sup>lt; 0. The above counter-example is the consequence of the fact that in interval arithmetic, while the relation (u <sup>+</sup> v)w <sup>⊆</sup> uw <sup>+</sup> vw always holds for u, v, w <sup>∈</sup> **<sup>I</sup>**R, the converse relation (u <sup>+</sup> v)w <sup>⊇</sup> uw <sup>+</sup> vw may fail. However, if u and v are both positive or both negative then the converse also holds [21, p. 13].

We can obtain a weak calculus for sum and product of two functions f and g if we first use an operation that is routinely performed in interval analysis, namely to approximate the values <sup>L</sup>f(x) and <sup>L</sup>g(x) with the smallest axes aligned hyperrectangle containing it, and then assume that the two induced hyper-rectangles have the same sign in each of their components. We now formalise this procedure.

Let H : **<sup>C</sup>**(R<sup>n</sup>) <sup>→</sup> **<sup>I</sup>**R<sup>n</sup> be the map that takes every convex compact set to the smallest axes aligned hyper-rectangle containing it. Then, it is easy to check that <sup>H</sup> is Scott continuous. Let <sup>π</sup><sup>i</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> be the projection of the <sup>i</sup>th coordinate and extend it pointwise to its maximal extension **<sup>I</sup>**π<sup>i</sup> : **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**R. Define the predicate Sgn <sup>⊂</sup> (**I**R<sup>n</sup>)<sup>2</sup> by (x, y) <sup>∈</sup> Sgn if for each i = 1,...,n the two intervals **<sup>I</sup>**π<sup>i</sup>(x) and **<sup>I</sup>**π<sup>i</sup>(y) are either both positive or both negative.

Suppose x, y, z <sup>∈</sup> **<sup>I</sup>**R<sup>n</sup> and (y, z) <sup>∈</sup> Sgn, then the interval **<sup>I</sup>**π<sup>i</sup>(y)**I**π<sup>i</sup>(z) is positive for each i = 1 ...,n and we have x(y <sup>+</sup> z) = xy <sup>+</sup> xz. In fact,

$$\mathbf{I}\pi\_i(x)(\mathbf{I}\pi\_i(y) + \mathbf{I}\pi\_i(z)) = \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(y) + \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(z),$$

and hence:

$$\begin{aligned} x(y+z) &= \sum\_{i=1}^{n} \mathbf{I}\pi\_i(x)(\mathbf{I}\pi\_i(y) + \mathbf{I}\pi\_i(z)) = \sum\_{i=1}^{n} \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(y) + \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(z) \\ &= \sum\_{i=1}^{n} \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(y) + \sum\_{i=1}^{n} \mathbf{I}\pi\_i(x)\mathbf{I}\pi\_i(z) = xy + xz \end{aligned}$$

**Proposition 6.** *Suppose* f,g : **<sup>I</sup>**U <sup>⊆</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *are locally Lipschitz functions and* x <sup>∈</sup> **<sup>I</sup>**U *is such that* (H(Lf(x)), H(Lg(x))) <sup>∈</sup> Sgn*. Then:*

*1.*

$$H(\mathcal{L}f(x)) + H(\mathcal{L}g(x)) \supseteq H(\mathcal{L}(f+g)(x))$$

*2. If, in addition,* (f(x), g(x)) <sup>∈</sup> Sgn*, then we also have:*

$$f(x)H(\mathcal{L}g(x)) + g(x)H(\mathcal{L}f(x)) \supseteq H(\mathcal{L}(fg)(x))$$

We will provide the proof for a weak form of the chain rule, which is more involved compared to sum and product. First consider the extended scalar multiplication M : **<sup>C</sup>**(R<sup>n</sup>) <sup>×</sup> **<sup>I</sup>**R<sup>+</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>), where <sup>R</sup><sup>+</sup> is the set of non-negative reals, with M(b, x) = {ur : u <sup>∈</sup> b, r <sup>∈</sup> x}. Then, M is well-defined and Scott continuous. For ease of presentation, we write M(b, x) = bx.

**Proposition 7.** *If* <sup>g</sup> : **<sup>I</sup>**U<sup>1</sup> <sup>⊆</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *and* <sup>f</sup> : **<sup>I</sup>**U<sup>2</sup> <sup>⊆</sup> **<sup>I</sup>**<sup>R</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *and* Im(g) <sup>⊂</sup> **<sup>I</sup>**U<sup>2</sup> *with* (Lf)(g(x)) <sup>∈</sup> **<sup>I</sup>**R+*, are Scott-continuous, then:*

$$((\mathcal{L}f)\circ g)(x)\mathcal{L}g(x) \supseteq \mathcal{L}(f\circ g)(x)$$

#### **3 Lipschitzian Approximable Mapping**

Recall that, since **I**R<sup>n</sup>, **C**(R<sup>n</sup>) and R<sup>n</sup> are stably locally compact space and the category of stably locally compact spaces with continuous functions and the category of semi-strong proximity lattice with approximable mappings are equivalent, any continuous function f : **<sup>I</sup>**U <sup>⊂</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> defines an approximable mapping <sup>A</sup><sup>f</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup> by aA<sup>f</sup>a ⇐⇒ a f <sup>−</sup><sup>1</sup>(a ). On the other hand any approximable mapping with type R : B**<sup>I</sup>**R*<sup>n</sup>* <sup>→</sup> <sup>B</sup><sup>D</sup>, where <sup>D</sup> is either **<sup>I</sup>**<sup>R</sup> or **<sup>I</sup>**R<sup>n</sup> or **<sup>C</sup>**(R<sup>n</sup>), gives us a continuous function <sup>G</sup><sup>R</sup> : **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> <sup>D</sup>.

**Lemma 1.** *Let* f : **<sup>I</sup>**U <sup>⊂</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *be a Scott continuous function such that* <sup>f</sup>({x}) *is singleton for all* <sup>x</sup> <sup>∈</sup> <sup>U</sup>*. Suppose* <sup>a</sup><sup>1</sup> *is an open hyper-rectangle in* <sup>U</sup> *and* <sup>a</sup><sup>2</sup> *is an open interval. If* <sup>ˆ</sup><sup>f</sup> : <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *is the induced function with* f({x}) = { <sup>ˆ</sup>f(x)} *then:*

$$
\Box a\_1 \ll f^{-1}(\Box a\_2) \Rightarrow a\_1 \ll \hat{f}^{-1}(a\_2) \qquad \Box a\_1 \ A\_f \Box a\_2 \Rightarrow a\_1 \ A\_f \ a\_2
$$

Recall the definition of the predicate Sep <sup>⊂</sup> <sup>B</sup><sup>R</sup> <sup>×</sup> <sup>B</sup><sup>R</sup> from Subsect. 1.3.

**Definition 4.** *We say an approximable mapping* <sup>R</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup>*, where* <sup>U</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> *is a convex open set, has Lipschitzian constant* O *in* a*, with* O <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup> and* a ∈ T (U)*, if:*

$$\begin{aligned} \forall a\_1, a\_2 \in \mathcal{T}(U). &a\_1, a\_2 \prec a \&\ (a\_1, a\_2) \in \mathsf{Sep} \; \exists \; a'\_1, a'\_2 \in B\_{\mathbb{R}}.\\ \square a\_1 \; R \square a'\_1, \square a\_2 \; R \square a'\_2 \; \&\ a'\_1 - a'\_2 \prec O \cdot (a\_1 - a\_2), \end{aligned}$$

*and we say* R *is* Lipschitzian *in* a*. The set of all approximable mappings with the above property is denoted by* Δ(a, O)*, called the* knot *of* a *and* O*.*

Note that, by Proposition 1, the last formula in Definition 4 is equivalent to a <sup>1</sup> − a <sup>2</sup> ≺ -O · (<sup>a</sup><sup>1</sup> <sup>−</sup> a<sup>2</sup>). Given this equivalence, it is simpler to use the formula without the modal operator as we have done in this definition. By Proposition 1 and Stone duality, we have:

**Proposition 8.** *Suppose* f : **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *is a Scott continuous function such that* <sup>f</sup>({x}) *is singleton for every* <sup>x</sup> <sup>∈</sup> <sup>U</sup>*. Then we have:* <sup>A</sup><sup>f</sup> <sup>ˆ</sup> <sup>∈</sup> <sup>Δ</sup>(a, O) *if* <sup>A</sup><sup>f</sup> <sup>∈</sup> Δ(a, O)*.*

From Δ(a, O), a Lipschitz property of <sup>G</sup><sup>R</sup> can be deduced as follows.

**Proposition 9.** *If* <sup>R</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup> *is an approximable mapping such that* <sup>R</sup> <sup>∈</sup> Δ(a, O) *then:*

$$\forall x, y \in \Box a. \, x \cap y = \emptyset \Rightarrow G\_R(x) - G\_R(y) \subseteq \overline{O} \cdot (x - y)$$

*Proof.* Let x, y <sup>∈</sup> <sup>a</sup> and <sup>x</sup> <sup>∩</sup> <sup>y</sup> <sup>=</sup> <sup>∅</sup>, then consider <sup>a</sup>1, a<sup>2</sup> ∈ T (U) such that (a1, a<sup>2</sup>) <sup>∈</sup> Sep and x <sup>∈</sup> a1, y <sup>∈</sup> a<sup>2</sup>. Hence, there exist a 1, a <sup>2</sup> <sup>∈</sup> <sup>B</sup><sup>R</sup> such that <sup>a</sup><sup>i</sup> <sup>R</sup> a <sup>i</sup>, i = 1, 2 and:

$$a\_1' - a\_2' \prec O \cdot (a\_1 - a\_2)$$

By Stone duality we have <sup>R</sup> <sup>=</sup> <sup>R</sup><sup>G</sup>*<sup>R</sup>* . Hence <sup>a</sup><sup>i</sup> <sup>≺</sup> <sup>G</sup><sup>−</sup><sup>1</sup> <sup>R</sup> (a <sup>i</sup>), i = 1, 2, and thus:,

$$G\_R(x) - G\_R(y) \subseteq O \cdot (a\_1 - a\_2).$$

Since this holds for all sufficiently small <sup>a</sup><sup>1</sup> and <sup>a</sup><sup>2</sup> that contain <sup>x</sup> and <sup>y</sup> respectively, we obtain: G<sup>R</sup>(x) <sup>−</sup> <sup>G</sup><sup>R</sup>(y) <sup>⊆</sup> <sup>O</sup> · (<sup>x</sup> <sup>−</sup> <sup>y</sup>).

**Corollary 3.** *If* R <sup>∈</sup> Δ(a, O) *then* <sup>G</sup><sup>R</sup> <sup>∈</sup> <sup>δ</sup>(a, O)*.*

Thus, if <sup>A</sup><sup>f</sup> is a Lipschitzian approximable mapping of type <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup> then f is a Lipschitz function of type **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**<sup>R</sup> and hence f({x}) is a singleton for every x <sup>∈</sup> U and the induced function <sup>ˆ</sup>f : U <sup>→</sup> <sup>R</sup> is also Lipschitz.

Now we are in a position to obtain duality results similar to those in [13] for functions of type **<sup>I</sup>**U <sup>⊆</sup> **<sup>I</sup>**R<sup>n</sup> <sup>→</sup> **<sup>I</sup>**R.

**Proposition 10.** *Let* f <sup>∈</sup> δ(a, b) *then for every* <sup>a</sup><sup>0</sup> ∈ T *such that* <sup>a</sup><sup>0</sup> <sup>≺</sup> <sup>a</sup> *and every* O <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup> such that* <sup>b</sup> <sup>⊂</sup> <sup>O</sup> *we have* <sup>A</sup><sup>f</sup> <sup>∈</sup> <sup>Δ</sup>(a<sup>0</sup>, O)*.*

*Proof.* Suppose <sup>a</sup><sup>0</sup> <sup>≺</sup> <sup>a</sup>. Let <sup>a</sup><sup>1</sup>, a<sup>2</sup> ∈ T (U) with (a<sup>1</sup>, a<sup>2</sup>) <sup>∈</sup> Sep and <sup>a</sup><sup>1</sup>, a<sup>2</sup> <sup>≺</sup> <sup>a</sup><sup>0</sup>. Then, since <sup>a</sup><sup>1</sup>, <sup>a</sup><sup>2</sup> <sup>∈</sup> **<sup>I</sup>**U, from definition of the tie <sup>δ</sup>(a, b), we have,

$$\begin{aligned} f(\overline{a\_1}) - f(\overline{a\_2}) &\subseteq b \cdot (\overline{a\_1} - \overline{a\_2}) \\ &\subseteq O \cdot (a\_1 - a\_2). \end{aligned}$$

Since f(a<sup>1</sup>), f(a<sup>2</sup>) <sup>∈</sup> **<sup>I</sup>**<sup>R</sup> are compact, there exist open hyper-rectangles <sup>a</sup> 1, a <sup>2</sup> ∈ <sup>B</sup><sup>R</sup> such that <sup>f</sup>(a<sup>i</sup>) <sup>⊆</sup> <sup>a</sup> <sup>i</sup>, i = 1, 2, and a <sup>1</sup> <sup>−</sup> <sup>a</sup> <sup>2</sup> <sup>≺</sup> <sup>O</sup> · (a<sup>1</sup> <sup>−</sup> <sup>a</sup><sup>2</sup>). This implies <sup>A</sup><sup>f</sup> <sup>∈</sup> <sup>Δ</sup>(a<sup>0</sup>, O). *Example 2.* Let f : **<sup>I</sup>**<sup>R</sup> <sup>→</sup> **<sup>I</sup>**<sup>R</sup> be given by:

$$f([x\_1, x\_2]) = [x\_1 - \delta(x\_2 - x\_1), x\_2 + \delta(x\_2 - x\_1)]$$

for δ > 0. The restriction <sup>ˆ</sup>f of f to the maximal elements of **<sup>I</sup>**<sup>R</sup> is the identity function of type <sup>ˆ</sup>f = Id : <sup>R</sup> <sup>→</sup> <sup>R</sup>. Since **<sup>I</sup>**Id <sup>=</sup> f, the map f is not the maximal extension of the identity map Id. On the other hand, <sup>A</sup><sup>f</sup> : <sup>B</sup>**I**<sup>R</sup> <sup>→</sup> <sup>B</sup>**I**<sup>R</sup> satisfies <sup>A</sup><sup>f</sup> <sup>∈</sup> <sup>Δ</sup>(-<sup>R</sup>, O) iff (1 <sup>−</sup> δ, 1 + <sup>δ</sup>) <sup>⊆</sup> <sup>O</sup>. However, <sup>A</sup><sup>f</sup> <sup>ˆ</sup> <sup>∈</sup> <sup>Δ</sup>(R, O) iff 1 <sup>∈</sup> <sup>O</sup>.

The following two propositions represent a domain isomorphism between the function space (**I**<sup>U</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>)) and the domain of approximable mappings (B**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>C</sup>**(R*n*)) ordered by inclusion.

**Proposition 11.** *1. For* <sup>f</sup>1, f<sup>2</sup> : **<sup>I</sup>**<sup>U</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>) *we have:*

$$f\_1 \subseteq f\_2 \Longleftrightarrow A\_{f\_1} \subseteq A\_{f\_2}$$

*2. For* <sup>R</sup>1, R<sup>2</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>C</sup>**(R*n*) *we have:*

$$R\_1 \subseteq R\_2 \Longleftrightarrow G\_{R\_1} \subseteq G\_{R\_2}$$

**Proposition 12.** *1. If* (f<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> *is a directed set in* **<sup>I</sup>**<sup>U</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>)*, with supremum* <sup>f</sup> = sup<sup>i</sup>∈<sup>I</sup> <sup>f</sup><sup>i</sup>*, then* <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>f</sup>*<sup>i</sup>* <sup>=</sup> <sup>A</sup><sup>f</sup> *in* App(B**<sup>I</sup>**<sup>U</sup> , B**<sup>C</sup>**(R*n*))*.*

*2. If* (R<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> *is a directed set in* App(B**<sup>I</sup>**<sup>U</sup> , B**<sup>C</sup>**(R*n*)) *then* sup<sup>i</sup>∈<sup>I</sup> <sup>G</sup><sup>R</sup>*<sup>i</sup>* <sup>=</sup> <sup>G</sup><sup>R</sup> *in* (**I**<sup>U</sup> <sup>→</sup> **<sup>C</sup>**(R<sup>n</sup>)) *where* <sup>R</sup> = sup<sup>i</sup>∈<sup>I</sup> <sup>R</sup><sup>i</sup>*.*

**Definition 5.** *If* a *is an open hyper-rectangle and* O *is a basic convex open set then the single-step approximable mapping* η(a,O) *is defined as* <sup>η</sup>(a,O) = AOχ*<sup>a</sup>* : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>C</sup>**(R*n*)*.*

For defining the Lipschitzian derivative of an approximable mapping we first need to define the notions of a strong tie and a strong knot.

**Definition 6.** *We say* f : **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *has a* strong set-valued Lipschitz constant <sup>b</sup> <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>) *in* a*, for* a ∈ T (U)*, denoted by* f <sup>∈</sup> δ<sup>s</sup>(a, b)*, if there exist* a <sup>≺</sup> a *and* <sup>b</sup> <sup>∈</sup> **<sup>C</sup>**(R<sup>n</sup>) *with* <sup>b</sup> **C**(R*n*) <sup>b</sup> *such that* <sup>f</sup> <sup>∈</sup> <sup>δ</sup>(a , b )*. We call* δ<sup>s</sup>(a, b) *the strong single-tie of* a *with* b*.*

From general results about single-step functions, [16] we know that if bχ<sup>a</sup> <sup>L</sup>f, then for every x <sup>∈</sup> a we have b Lf(x), and hence, <sup>L</sup>f(x) <sup>∈</sup> b. This means <sup>L</sup>f(a) <sup>⊆</sup> b. Moreover a (Lf)−<sup>1</sup>( b).

Similar to Proposition VII.3 in [13] and its corollary, we have:

**Proposition 13.** *If* f : **<sup>I</sup>**U <sup>→</sup> **<sup>I</sup>**<sup>R</sup> *is locally Lipschitz, then:*

$$f \in \delta\_s(\Box a, b) \Longleftrightarrow b\chi\_{\Box a} \ll \mathcal{L}f$$

$$\mathcal{L}f = \sup\{b\chi\_{\Box a} : b\chi\_{\Box a} \ll \mathcal{L}f\} = \sup\{b\chi\_{\Box a} : f \in \delta\_s(\Box a, b)\}$$

**Definition 7.** *We say an approximable mapping* <sup>R</sup> : <sup>B</sup>**I**<sup>U</sup> <sup>→</sup> <sup>B</sup>**I**<sup>R</sup> *has* strong Lipschitz constant O *in* a*, for* O <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup> and* <sup>a</sup> ∈ T (U)*, denoted by* <sup>R</sup> <sup>∈</sup> Δs(a, O)*, if there exist* a ∈ T (U) *with* a <sup>≺</sup> a *and* O <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup> with* <sup>O</sup> <sup>≺</sup> <sup>O</sup> *such that* R <sup>∈</sup> Δ(a , O )*.*

**Proposition 14.** *1. If* f <sup>∈</sup> δs(a, b) *then for all* O <sup>∈</sup> B<sup>0</sup> <sup>R</sup>*<sup>n</sup> with* <sup>b</sup> <sup>⊂</sup> <sup>O</sup> *we have* <sup>A</sup><sup>f</sup> <sup>∈</sup> <sup>Δ</sup>s(a, O)*.*


Finally, we obtain the duality between strong ties and strong knots extending the main result in [13] to functions with interval input and output.

**Corollary 4.** *We have* R <sup>∈</sup> Δ<sup>s</sup>(a, O) *iff* <sup>G</sup><sup>R</sup> <sup>∈</sup> <sup>δ</sup><sup>s</sup>(a, O)*. Dually, we have* f <sup>∈</sup> δ<sup>s</sup>(a, b) *iff* <sup>A</sup><sup>f</sup> <sup>∈</sup> <sup>Δ</sup><sup>s</sup>(a, b◦)*.*

**Definition 8.** *Let* <sup>R</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup> *be a Lipschitzian approximable mapping. The Lipschitzian derivative of* R *is defied as:*

$$L(R) = \sup \{ \eta\_{(\square a, O)} : R \in \Delta\_s(\square a, O) \}$$

*which is of type* <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>C</sup>**(R*n*)*.*

The following theorem extends Theorem VII.12 in [13] to functions with interval input and output.

**Theorem 2.** *The Lipschitzian derivative of a Lipschitzian approximable mapping* <sup>R</sup> : <sup>B</sup>**<sup>I</sup>**<sup>U</sup> <sup>→</sup> <sup>B</sup>**<sup>I</sup>**<sup>R</sup> *is an approximable mapping and we have:* <sup>L</sup>(R) = <sup>A</sup><sup>L</sup>G*<sup>R</sup> .*

#### **4 Conclusion**

We have developed a notion of sub-differentiation for Scott continuous maps which take hyper-rectangles in a finite dimensional Euclidean spaces to compact real intervals and is itself a Scott continuous map. This extends the domain of application of Interval Analysis to the classical derivative. It also extends Clarke's theory and that of the L-derivative to functions with imprecise input/output as one encounters in interval analysis and exact real number computation. The classical Clarke operator commutes with the extension operator that extends a non-empty convex and compact valued map of a finite dimensional Euclidean spaces to the space of the hyper-rectangles of the Euclidean space. We have derived a calculus for sub-differentiation of interval maps which is weaker than the corresponding Clarkes calculus for point maps. A Stone duality framework for sub-differentiation of interval maps is also constructed which allows for a program logic view of sub-differentiation. We envisage several areas for immediate further work, namely an implementation of this work in Haskell, an implementation in a theorem prover such as Coq and a derivation of a weak calculus for constructors of approximable mappings which would match the calculus for the interval functions.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Effects of Adding Reachability Predicates in Propositional Separation Logic**

St´ephane Demri<sup>1</sup>, Etienne Lozes ´ <sup>2</sup>, and Alessio Mansutti1(B)

<sup>1</sup> LSV, CNRS, ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France alessio.mansutti@lsv.fr <sup>2</sup> I3S, Universit´e Cˆote d'Azur, Nice, France

**Abstract.** The list segment predicate ls used in separation logic for verifying programs with pointers is well-suited to express properties on singly-linked lists. We study the effects of adding ls to the full propositional separation logic with the separating conjunction and implication, which is motivated by the recent design of new fragments in which all these ingredients are used indifferently and verification tools start to handle the magic wand connective. This is a very natural extension that has not been studied so far. We show that the restriction without the separating implication can be solved in polynomial space by using an appropriate abstraction for memory states whereas the full extension is shown undecidable by reduction from first-order separation logic. Many variants of the logic and fragments are also investigated from the computational point of view when ls is added, providing numerous results about adding reachability predicates to propositional separation logic.

### **1 Introduction**

Separation logic [20,25,28] is a well-known assertion logic for reasoning about programs with dynamic data structures. Since the implementation of Smallfoot and the evidence that the method is scalable [3,33], many tools supporting separation logic as an assertion language have been developed [3,8,9,16,17,33]. Even though the first tools could handle relatively limited fragments of separation logic, like symbolic heaps, there is a growing interest and demand to consider extensions with richer expressive power. We can point out three particular extensions of symbolic heaps (without list predicates) that have been proved decidable.

– Symbolic heaps with generalised inductive predicates, adding a fixpoint combinator to the language, is a convenient logic for specifying data structures that are more advanced than lists or trees. The entailment problem is known to be decidable by means of tree automata techniques for the bounded treewidth fragment [1,19], whereas satisfiability is ExpTime-complete [6]. Other related results can be found in [21].


A natural question is how to combine these extensions, and which separation logic fragment that allows Boolean connectives, magic wand and generalised recursive predicates can be decided with some adequate restrictions. As already advocated in [7,18,24,29,31], dealing with the separating implication −∗ is a desirable feature for program verification and several semi-automated or automated verification tools support it in some way, see e.g. [18,24,29,31].

*Our Contribution.* In this paper, we address the question of combining magic wand and inductive predicates in the extremely limited case where the only inductive predicate is the gentle list segment predicate ls. So the starting point of this work is this puzzling question: what is the complexity/decidability status of propositional separation logic SL(∗, −∗) enriched with the list segment predicate ls (herein called SL(∗, −∗, ls))? More precisely, we study the decidability/complexity status of extensions of propositional separation logic SL(∗, −∗) by adding one of the reachability predicates among ls (precise predicate as usual in separation logic), reach (existence of a path, possibly empty) and reach<sup>+</sup> (existence of a non-empty path).

First, we establish that the satisfiability problem for the propositional separation logic SL(∗, −∗, ls) is undecidable. Our proof is by reduction from the undecidability of first-order separation logic [5,14], using an encoding of the variables as heap cells (see Theorem 1). As a consequence, we also establish that SL(∗, −∗, ls) is not finitely axiomatisable. Moreover, our reduction requires a rather limited expressive power of the list segment predicate, and we can strengthen our undecidability results to some fragments of SL(∗, −∗, ls). For instance, surprisingly, the extension of SL(∗, −∗) with the atomic formulae of the form reach(x, y) = 2 and reach(x, y) = 3 (existence of a path between x and y of respective length 2 or 3) is already undecidable, whereas the satisfiability problem for SL(∗, −∗, reach(x, <sup>y</sup>) = 2) is known to be in PSpace [15].

Second, we show that the satisfiability problem for SL(∗, reach+) is PSpacecomplete, extending the well-known result on SL(∗). The PSpace upper bound relies on a small heap property based on the techniques of test formulae, see e.g. [4,15,22,23], and the PSpace-hardness of SL(∗) is inherited from [11]. The PSpace upper bound can be extended to the fragment of SL(∗, −∗, reach+) made of Boolean combinations of formulae from SL(∗, reach+) <sup>∪</sup> SL(∗, −∗) (see the developments in Sect. 4). Even better, we show that the fragment of SL(∗, −∗, reach+) in which reach<sup>+</sup> is not in the scope of −∗ is decidable. As far as we know, this is the largest fragment including full Boolean expressivity, −∗ and ls for which decidability is established.

### **2 Preliminaries**

Let PVAR = {x, <sup>y</sup>,...} be a countably infinite set of *program variables* and LOC = {-0, -1, -<sup>2</sup>,...} be a countable infinite set of *locations*. A *memory state* is a pair (s, h) such that s : PVAR → LOC is a variable valuation (known as the *store*) and h : LOC →fin LOC is a partial function with finite domain, known as the *heap*. We write dom(h) to denote its domain and ran(h) to denote its range. Given a heap h with dom(h) = {-1,...,<sup>n</sup>}, we also write {-<sup>1</sup> → h(-1),...,<sup>n</sup> → h(<sup>n</sup>)} to denote h. Each <sup>i</sup> → h(<sup>i</sup>) is understood as a *memory cell* of h.

As usual, the heaps h<sup>1</sup> and h<sup>2</sup> are said to be *disjoint*, written h<sup>1</sup> ⊥ h2, if dom(h1) ∩ dom(h2) = ∅; when this holds, we write h<sup>1</sup> + h<sup>2</sup> to denote the heap corresponding to the disjoint union of the graphs of h<sup>1</sup> and h2, hence dom(h<sup>1</sup> + h2) = dom(h1) dom(h2). When the domains of h<sup>1</sup> and h<sup>2</sup> are not disjoint, the composition h<sup>1</sup> + h<sup>2</sup> is not defined. Moreover, we write h h to denote that dom(h ) ⊆ dom(h) and for all locations - ∈ dom(h ), we have h (-) = h(-). The formulae <sup>ϕ</sup> of the separation logic SL(∗, −∗, ls) and its atomic formulae <sup>π</sup> are built from <sup>π</sup> ::= <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>|</sup> <sup>x</sup> <sup>→</sup> <sup>y</sup> <sup>|</sup> ls(x, <sup>y</sup>) <sup>|</sup> emp | and <sup>ϕ</sup> ::= <sup>π</sup> | ¬<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∗</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> −∗ <sup>ϕ</sup>, where <sup>x</sup>, <sup>y</sup> <sup>∈</sup> PVAR (⇒, <sup>⇔</sup> and <sup>∨</sup> are defined as usually). Models of the logic SL(∗, −∗, ls) are memory states and the satisfaction relation |= is defined as follows (omitting standard clauses for ¬,∧):

(s, h) <sup>|</sup><sup>=</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup> ⇐⇒ <sup>s</sup>(x) = <sup>s</sup>(y) (s, h) <sup>|</sup><sup>=</sup> emp ⇐⇒ dom(h) = <sup>∅</sup> (s, h) <sup>|</sup><sup>=</sup> <sup>x</sup> <sup>→</sup> <sup>y</sup> ⇐⇒ <sup>s</sup>(x) <sup>∈</sup> dom(h) and <sup>h</sup>(s(x)) = <sup>s</sup>(y) (s, h) <sup>|</sup><sup>=</sup> ls(x, <sup>y</sup>) ⇐⇒ either (dom(h) = <sup>∅</sup> and <sup>s</sup>(x) = <sup>s</sup>(y)) or h = {-<sup>0</sup> → -1, -<sup>1</sup> → -2,...,<sup>n</sup>−<sup>1</sup> → <sup>n</sup>} with n ≥ 1, -<sup>0</sup> = s(x), <sup>n</sup> <sup>=</sup> <sup>s</sup>(y) and for all <sup>i</sup> <sup>=</sup> <sup>j</sup> <sup>∈</sup> [0, n], <sup>i</sup> = j (s, h) |= ϕ<sup>1</sup> ∗ ϕ<sup>2</sup> ⇐⇒ there are h<sup>1</sup> and h<sup>2</sup> such that (h1⊥h2,(h<sup>1</sup> + h2) = h, (s, h1) |= ϕ<sup>1</sup> and (s, h2) |= ϕ2) (s, h) |= ϕ<sup>1</sup> −∗ ϕ<sup>2</sup> ⇐⇒ ∀h<sup>1</sup> if (h1⊥h and (s, h1) |= ϕ1) then (s, h + h1) |= ϕ2.

Note that the semantics for <sup>∗</sup>, −∗, →, ls and for all other ingredients is the usual one in separation logic and ls is the *precise* list segment predicate. In the sequel, we use the following abbreviations: size <sup>≥</sup> <sup>0</sup> def = and for all β ≥ 0, size <sup>≥</sup> <sup>β</sup>+1 def = (size <sup>≥</sup> <sup>β</sup>)∗¬emp, size <sup>≤</sup> <sup>β</sup> def <sup>=</sup> <sup>¬</sup>(size <sup>≥</sup> <sup>β</sup>+1) and size <sup>=</sup> <sup>β</sup> def = (size <sup>≤</sup> <sup>β</sup>) <sup>∧</sup> (size <sup>≥</sup> <sup>β</sup>). Moreover, <sup>ϕ</sup><sup>1</sup> <sup>−</sup> ϕ<sup>2</sup> def = ¬(ϕ<sup>1</sup> −∗ ¬ϕ2) (*septraction connective*), alloc(x) def = (<sup>x</sup> <sup>→</sup> <sup>x</sup>)−∗ ⊥ and <sup>x</sup> → <sup>y</sup> def = (<sup>x</sup> <sup>→</sup> <sup>y</sup>) <sup>∧</sup> size = 1. W.l.o.g., we can assume that LOC = N since none of the developments depend on the elements of LOC as the only predicate involving locations is the equality. We write SL(∗, −∗) to denote the restriction of SL(∗, −∗, ls) without ls. Similarly, we write SL(∗) to denote the restriction of SL(∗, −∗) without −∗. Given two formulae ϕ, ϕ (possibly from different logical languages), we write ϕ ≡ ϕ whenever for all (s, h), we have (s, h) |= ϕ iff (s, h) |= ϕ . When ϕ ≡ ϕ , the formulae ϕ and ϕ are said to be *equivalent*.

*Variants with Other Reachability Predicates.* We use two additional reachability predicates reach(x, <sup>y</sup>) and reach+(x, <sup>y</sup>) and we write SL(∗, −∗, reach) (resp. SL(∗, −∗, reach+)) to denote the variant of SL(∗, −∗, ls) in which ls is replaced by reach (resp. by reach+). The relation <sup>|</sup>= is extended as follows: (s, h) <sup>|</sup><sup>=</sup> reach(x, <sup>y</sup>) holds when there is <sup>i</sup> <sup>≥</sup> 0 such that <sup>h</sup><sup>i</sup> (s(x)) = s(y) (i functional composition(s) of h is denoted by h<sup>i</sup> ) and (s, h) <sup>|</sup><sup>=</sup> reach+(x, <sup>y</sup>) holds when there is <sup>i</sup> <sup>≥</sup> 1 such that <sup>h</sup><sup>i</sup> (s(x)) = <sup>s</sup>(y). As ls(x, <sup>y</sup>) <sup>≡</sup> reach(x, <sup>y</sup>)∧¬(¬emp∗reach(x, <sup>y</sup>)) and reach(x, <sup>y</sup>) ≡ ∗ ls(x, <sup>y</sup>), the logics SL(∗, −∗, reach) and SL(∗, −∗, ls) have identical decidability status. As far as computational complexity is concerned, a similar analysis can be done as soon as <sup>∗</sup>, <sup>¬</sup>, <sup>∧</sup> and emp are parts of the fragments (the details are omitted here). Similarly, we have the equivalences: reach(x, <sup>y</sup>) <sup>≡</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup>∨reach+(x, <sup>y</sup>) and ls(x, <sup>y</sup>) <sup>≡</sup> (<sup>x</sup> <sup>=</sup> <sup>y</sup>∧emp)∨(reach+(x, <sup>y</sup>)<sup>∧</sup> <sup>¬</sup>(¬emp∗reach+(x, <sup>y</sup>))). So clearly, SL(∗, reach) and SL(∗, ls) can be viewed as fragments of SL(∗, reach+) and, SL(∗, −∗, ls) as a fragment of SL(∗, −∗, reach+). It is therefore stronger to establish decidability or complexity upper bounds with reach<sup>+</sup> and to show undecidability or complexity lower bounds with ls or reach. Herein, we provide the optimal results.

*Decision Problems.* Let L be a logic defined above. As usual, the *satisfiability problem for* L takes as input a formula ϕ from L and asks whether there is (s, h) such that (s, h) |= ϕ. The *validity problem* is also defined as usual. The *model-checking problem for* L takes as input a formula ϕ from L, (s, h) and asks whether (s, h) |= ϕ (s is restricted to the variables occurring in ϕ and h is encoded as a finite and functional graph). Unless otherwise specified, the *size* of a formula ϕ is understood as its tree size, i.e. approximately its number of symbols.

The main purpose of this paper is to study the decidability/complexity status of SL(∗, −∗, ls) and its fragments.

### **3 Undecidability of SL(***∗, −∗,* ls**)**

In this section, we show that SL(∗, −∗, ls) has an undecidable satisfiability problem even though it does not admit first-order quantification.

Let SL(∀, −∗) be the first-order extension of SL(−∗) obtained by adding the universal quantifier <sup>∀</sup>. The formulae <sup>ϕ</sup> of SL(∀, −∗) are built from <sup>π</sup> ::= <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>|</sup> <sup>x</sup> <sup>→</sup> <sup>y</sup> and <sup>ϕ</sup> ::= <sup>π</sup> | ¬<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∨</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> −∗ <sup>ϕ</sup> | ∀<sup>x</sup> <sup>ϕ</sup>, where <sup>x</sup>, <sup>y</sup> <sup>∈</sup> PVAR. Note that emp can be easily defined by <sup>∀</sup> <sup>x</sup>, <sup>x</sup> <sup>¬</sup>(<sup>x</sup> <sup>→</sup> <sup>x</sup> ). Models of the logic SL(∀, −∗) are memory states and the satisfaction relation |= is defined as for SL(−∗) with the additional clause:

(s, h) <sup>|</sup><sup>=</sup> <sup>∀</sup><sup>x</sup> <sup>ϕ</sup> ⇐⇒ for all - <sup>∈</sup> LOC, we have (s[<sup>x</sup> <sup>←</sup> -], h) |= ϕ. Without any loss of generality, we can assume that the satisfiability [resp. validity] problem for SL(∀, −∗) is defined by taking as inputs closed formulae (i.e. without free occurrences of the variables).

**Proposition 1.** *[5,14] The satisfiability problem for* SL(∀, −∗) *is undecidable and the set of valid formulae for* SL(∀, −∗) *is not recursively enumerable.*

In a nutshell, we establish the undecidability of SL(∗, −∗, ls) by reduction from the satisfiability problem for SL(∀, −∗). The reduction is nicely decomposed in two intermediate steps: (1) the undecidability of SL(∗, −∗) extended with a few atomic predicates, to be defined soon, and (2) a *tour de force* resulting in the encoding of these atomic predicates in SL(∗, −∗, ls).

#### **3.1 Encoding Quantified Variables as Cells in the Heap**

In this section, we assume for a moment that we can express three atomic predicates alloc−<sup>1</sup>(x), <sup>n</sup>(x) = <sup>n</sup>(y) and <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y), that will be used in the translation and have the following semantics:


Let us first intuitively explain how the two last predicates will help encoding SL(∀, −∗). By definition, the satisfaction of the quantified formula <sup>∀</sup><sup>x</sup> <sup>ψ</sup> from SL(∀, −∗) requires the satisfaction of the formula ψ for all the values in LOC assigned to x. The principle of the encoding is to use a set L of locations initially not in the domain or range of the heap to mimic the store by modifying how they are allocated. In this way, a variable will be interpreted by a location in the heap and, instead of checking whenever <sup>x</sup> <sup>→</sup> <sup>y</sup> (or <sup>x</sup> <sup>=</sup> <sup>y</sup>) holds, we will check if <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) (or <sup>n</sup>(x) = <sup>n</sup>(y)) holds, where <sup>x</sup> and <sup>y</sup> correspond, after the translation, to the locations in L that mimic the store for those variables. Let X be the set of variables needed for the translation. In order to properly encode the store, each location in L only mimics exactly one variable, i.e. there is a bijection between X and L, and cannot be reached by any location. As such, the formula <sup>∀</sup><sup>x</sup> <sup>ψ</sup> will be encoded by the formula (alloc(x)∧size = 1)−∗ (OK(X) <sup>⇒</sup> T(ψ)), where OK(X) (formally defined below) checks whenever the locations in L still satisfy the auxiliary conditions just described, whereas T(ψ) is the translation of ψ.

Unfortunately, the formula ψ1−∗ψ<sup>2</sup> cannot simply be translated into T(ψ1)−∗ (OK(X) ⇒ T(ψ2)) because the evaluation of T(ψ1) in a disjoint heap may need the values of free variables occurring in ψ<sup>1</sup> but our encoding of the variable valuations via the heap does not allow to preserve these values through disjoint heaps. In order to solve this problem, for each variable x in the formula, X will contain an auxiliary variable x, or alternatively we define on X an involution (.). If the translated formula has q variables then the set X of variables needed for the translation will have cardinality 2q. In the translation of a formula whose outermost connective is the magic wand, the locations corresponding to variables of the form x will be allocated on the left side of the magic wand, and checked to be equal to their non-bar versions on the right side of the magic wand. As such, the left side of the magic wand will be translated into

$$((\bigwedge\_{\mathbf{z}\in Z} \mathbf{a} \mathbf{1} \mathbf{loc}(\overline{\mathbf{z}})) \land (\bigwedge\_{\mathbf{z}\in X} \neg \mathbf{a} \mathbf{1} \mathbf{loc}(\mathbf{z})) \land \mathbf{OK}(Z) \land \mathbf{T}(\psi\_1)[\mathbf{z} \leftarrow \overline{\mathbf{z}} \mid \mathbf{z} \in X]),$$

where Z is the set of free variables in ψ1, whereas the right side will be

$$(((\bigwedge\_{\mathbf{z}\in Z} n(\mathbf{z}) = n(\overline{\mathbf{z}})) \land \mathrm{OK}(X)) \Rightarrow ((\bigwedge\_{\mathbf{z}\in Z} \mathtt{a} \mathtt{1} \mathtt{loc}(\overline{\mathbf{z}}) \land \mathtt{size} = \mathrm{card}(Z)) \ast \mathrm{T}(\psi\_2))).$$

The use of the separating conjunction before the formula T(ψ2) separates the memory cells corresponding to x from the rest of the heap. By doing this, we can reuse x whenever a magic wand appears in T(ψ2).

For technical convenience, we consider a slight alternative for the semantics of the logics SL(∀, −∗) and SL(∗, −∗, ls), which does not modify the notion of satisfiability/validity and such that the set of formulae and the definition of the satisfaction relation |= remain unchanged. So far, the memory states are pairs of the form (s, h) with s : PVAR → LOC and h : LOC →fin LOC for a *fixed* countably infinite set of locations LOC, say LOC = N. Alternatively, the models for SL(∀, −∗) and SL(∗, −∗, ls) can be defined as triples (LOC1, s1, h1) such that LOC<sup>1</sup> is a countable infinite set, s<sup>1</sup> : PVAR → LOC<sup>1</sup> and h<sup>1</sup> : LOC<sup>1</sup> →fin LOC1. As shown below, this does not change the notion of satisfiability and validity, but this generalisation will be handy in a few places. Most of the time, a generalised memory state (LOC1, s1, h1) shall be written (s1, h1) when no confusion is possible.

Given a bijection f : LOC<sup>1</sup> → LOC<sup>2</sup> and a heap h<sup>1</sup> : LOC<sup>1</sup> →fin LOC<sup>1</sup> equal to {-<sup>1</sup> → h1(-1),...,<sup>n</sup> → h1(<sup>n</sup>)}, we write f(h1) to denote the heap h<sup>2</sup> : LOC<sup>2</sup> →fin LOC<sup>2</sup> with h<sup>2</sup> = {f(-<sup>1</sup>) → f(h1(-<sup>1</sup>)),..., f(<sup>n</sup>) → f(h1(<sup>n</sup>))}.

**Definition 1.** *Let* (LOC1, s1, h1) *and* (LOC2, s2, h2) *be generalised memory states and* X ⊆ PVAR*. A* partial isomorphism *with respect to* X *from* (LOC1, s1, h1) *to* (LOC2, s2, h2) *is a bijection* f : LOC<sup>1</sup> → LOC<sup>2</sup> *such that* <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>f</sup>(h1) *and for all* <sup>x</sup> <sup>∈</sup> <sup>X</sup>*,* <sup>f</sup>(s1(x)) = <sup>s</sup>2(x) *(we write* (LOC1, s1, h1) <sup>≈</sup><sup>X</sup> (LOC2, s2, h2)*).*

A folklore result states that isomorphic memory states satisfy the same formulae since the logics SL(∀, −∗), SL(∗, −∗, ls) can only perform equality tests.

**Lemma 1.** *Let* (LOC1, s1, h1) *and* (LOC2, s2, h2) *be two generalised memory states such that* (LOC1, s1, h1) ≈<sup>X</sup> (LOC2, s2, h2)*, for some* X ⊆ PVAR*. (I) For all formulae* ϕ *in* SL(∀, −∗) *whose free variables are among* X*, we have* (LOC1, s1, h1) |= ϕ *iff* (LOC2, s2, h2) |= ϕ*. (II) For all formulae* ϕ *in* SL(∗, −∗, ls) *built on variables among* <sup>X</sup>*, we have* (LOC1, s1, h1) <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *iff* (LOC2, s2, h2) |= ϕ*.*

As a direct consequence, satisfiability in SL(∗, −∗, ls) as defined in Sect. 2, is equivalent to satisfiability with generalised memory states, the same holds for SL(∀, −∗). Next, we define the encoding of a generalised memory state. This can be seen as the semantical counterpart of the syntactical translation process and, as such, formalise the intuition of using part of a heap to mimic the store.

**Definition 2.** *Let* <sup>X</sup> <sup>=</sup> {x1,..., <sup>x</sup><sup>2</sup>q}*,* <sup>Y</sup> ⊆ {x1,..., <sup>x</sup>q} *and,* (LOC1, s1, h1) *and* (LOC2, s2, h2) *be two (generalised) memory states. We say that* (LOC1, s1, h1) *is encoded by* (LOC2, s2, h2) *w.r.t.* X, Y *, written* (LOC1, s1, h1) -Y <sup>q</sup> (LOC2, s2, h2)*, if the following conditions hold:*


Notice that <sup>h</sup><sup>2</sup> is equal to <sup>h</sup><sup>1</sup> plus the heap {s2(x) → <sup>s</sup>1(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>Y</sup> } that encodes the store s1. The picture below presents a memory state (left) and its encoding (right), where <sup>Y</sup> <sup>=</sup> {xi, <sup>x</sup><sup>j</sup> , <sup>x</sup>k}. From the encoding, we can retrieve the initial heap by removing the memory cells corresponding to xi, x<sup>j</sup> and xk. By way of example, the memory state on the left satisfies the formulae x<sup>i</sup> = x<sup>j</sup> , <sup>x</sup><sup>i</sup> <sup>→</sup> <sup>x</sup><sup>k</sup> and <sup>x</sup><sup>k</sup> <sup>→</sup> <sup>x</sup><sup>k</sup> whereas its encoding satisfies the formulae <sup>n</sup>(xi) = <sup>n</sup>(x<sup>j</sup> ), <sup>n</sup>(xi) <sup>→</sup> <sup>n</sup>(xk) and <sup>n</sup>(xk) <sup>→</sup> <sup>n</sup>(xk).

#### **3.2 The Translation**

We are now ready to define the translation of a first-order formula in propositional separation logic extended with the three predicates introduced at the beginning of the section. Let ϕ be a closed formula of SL(∀, −∗) with quantified variables {x1,..., <sup>x</sup>q}. W.l.o.g., we can assume that distinct quantifications involve distinct variables. Moreover, let <sup>X</sup> <sup>=</sup> {x1,..., <sup>x</sup><sup>2</sup>q} and (.) be the involution on <sup>X</sup> such that for all <sup>i</sup> <sup>∈</sup> [1, q] <sup>x</sup><sup>i</sup> def = xi+q.

We write OK(X) to denote the formula ( <sup>i</sup>=<sup>j</sup> <sup>x</sup><sup>i</sup> <sup>=</sup> <sup>x</sup><sup>j</sup> ) <sup>∧</sup> ( <sup>i</sup> <sup>¬</sup>alloc−<sup>1</sup>(xi)). The translation function T has two arguments: the formula in SL(∀, −∗) to be recursively translated and the total set of variables potentially appearing in the target formula (useful to check that OK(X) holds on every heap involved in the satisfaction of the translated formula). Let us come back to the definition of T(ψ,X) (homomorphic for Boolean connectives) with the assumption that the variables in ψ are among x1, ..., xq.

$$\begin{aligned} \mathrm{T}(\mathbf{x}\_{i} = \mathbf{x}\_{j}, X) & \stackrel{\text{def}}{=} n(\mathbf{x}\_{i}) = n(\mathbf{x}\_{j}) \\ \mathrm{T}(\mathbf{x}\_{i} \longleftarrow \mathbf{x}\_{j}, X) & \stackrel{\text{def}}{=} n(\mathbf{x}\_{i}) \longleftarrow n(\mathbf{x}\_{j}) \\ \mathrm{T}(\forall \mathbf{x}\_{i} \ \psi, X) & \stackrel{\text{def}}{=} (\mathbf{a} \mathbf{1} \mathbf{1} \mathbf{oc}(\mathbf{x}\_{i}) \wedge \mathbf{size} = 1) \twoheadrightarrow (\mathrm{OK}(X) \Rightarrow \mathrm{T}(\psi, X)) \end{aligned}$$

Lastly, the translation T(ψ<sup>1</sup> −∗ ψ2, X) is defined as

$$((\bigwedge\_{\mathbf{z}\in Z} \mathbf{a} \mathbf{1} \mathbf{loc}(\mathbf{Z})) \land (\bigwedge\_{\mathbf{z}\in X\nmid Z} \mathbf{a} \mathbf{1} \mathbf{loc}(\mathbf{\bar{z}})) \land \mathbf{OK}(X) \land \mathbf{T}(\psi\_1, X)[\mathbf{x} \leftarrow \bar{\mathbf{x}}]) \rightsquigarrow$$

$$(((\bigwedge\_{\mathbf{z}\in Z} n(\mathbf{z}) = n(\mathbf{\bar{z}})) \land \text{OK}(X)) \Rightarrow ((\bigwedge\_{\mathbf{z}\in Z} \mathbf{a} \mathbf{1} \mathbf{loc}(\mathbf{\bar{z}}) \land \mathbf{s} \mathbf{z} \mathbf{e} = \text{card}(Z)) \ast \text{T}(\psi\_2, X))),$$

where <sup>Z</sup> ⊆ {x1,..., <sup>x</sup>q} is the set of free variables in <sup>ψ</sup>1.

Here is the main result of this section, which is essential for the correctness of TSAT(ϕ), defined below.

**Lemma 2.** *Let* <sup>X</sup> <sup>=</sup> {x1,..., <sup>x</sup>2q}*,* <sup>Y</sup> ⊆ {x1,..., <sup>x</sup>q}*,* <sup>ψ</sup> *be a formula in* SL(∀, −∗) *with free variables among* Y *that does not contain any bound variable of* ψ *and* (LOC1, s1, h1) -Y <sup>q</sup> (LOC2, s2, h2)*. We have* (s1, h1) |= ψ *iff* (s2, h2) |= T(ψ,X)*.*

We define the translation <sup>T</sup>SAT(ϕ) in SL(∗, −∗, ls) where T(ϕ, X) is defined recursively.

$$\mathcal{T}\_{\mathrm{SAT}}(\varphi) \stackrel{\mathrm{def}}{=} \bigwedge\_{i \in [1, 2q]} \neg \mathbf{a} \mathbf{1} \mathbf{1} \mathbf{oc}(\mathbf{x}\_i)) \wedge \mathrm{OK}(X) \wedge \mathrm{T}(\varphi, X).$$

The first two conjuncts specify initial conditions, namely each variable y in X is interpreted by a location that is unallocated, it is not in the heap range and it is distinct from the interpretation of all other variables; in other words, the value for <sup>y</sup> is isolated. Similarly, let <sup>T</sup>VAL(ϕ) be the formula in SL(∗, −∗, ls) defined by (( <sup>i</sup>∈[1,2q] <sup>¬</sup>alloc(xi)) <sup>∧</sup> OK(X)) <sup>⇒</sup> T(ϕ, X). As a consequence of Lemma 2, ϕ and TSAT(ϕ) are shown equisatisfiable, whereas ϕ and TVAL(ϕ) are shown equivalid.

**Corollary 1.** *Let* <sup>ϕ</sup> *be a closed formula in* SL(∀, −∗) *using quantified variables among* {x1,..., <sup>x</sup>q}*. (I)* <sup>ϕ</sup> *and* <sup>T</sup>SAT(ϕ) *are equisatisfiable. (II)* <sup>ϕ</sup> *and* <sup>T</sup>VAL(ϕ) *are equivalid.*

#### **3.3 Expressing the Auxiliary Atomic Predicates**

To complete the reduction, we briefly explain how to express the formulae alloc−<sup>1</sup>(x), <sup>n</sup>(x) = <sup>n</sup>(y) and <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) within SL(∗, −∗, ls). Let us introduce a few macros that shall be helpful.


In order to define the existence of a predecessor (i.e. alloc−<sup>1</sup>(x)) in SL(∗, −∗, ls), we need to take advantage of an auxiliary variable y whose value is different from the one for x. Let alloc−<sup>1</sup> <sup>y</sup> (x) be the formula

$$\mathbf{x} \hookrightarrow \mathbf{x} \lor \mathbf{y} \hookrightarrow \mathbf{x} \lor [(\mathtt{a} \mathtt{1} \mathtt{oc}(\mathbf{y}) \land \neg(\mathbf{y} \hookrightarrow \mathbf{x}) \land \mathtt{size} = 1) \twoheadrightarrow \mathtt{recach}(\mathbf{y}, \mathbf{x}) = 2]\_1$$

**Lemma 3.** *Let* <sup>x</sup>, <sup>y</sup> <sup>∈</sup> PVAR*. (I) For all memory states* (s, h) *such that* <sup>s</sup>(x) <sup>=</sup> <sup>s</sup>(y)*, we have* (s, h) <sup>|</sup><sup>=</sup> alloc−<sup>1</sup> <sup>y</sup> (x) *iff* <sup>s</sup>(x) <sup>∈</sup> ran(h)*. (II) In the translation,* alloc−<sup>1</sup>(x) *can be replaced with* alloc−<sup>1</sup> <sup>x</sup> (x)*.*

As stated in Lemma 3(II), we can exploit the fact that in the translation of a formula with variables in {x1,..., <sup>x</sup>q}, we use 2<sup>q</sup> variables that correspond to 2q distinguished locations in the heap in order to retain the soundness of the translation while using alloc−<sup>1</sup> <sup>x</sup> (x) as alloc−<sup>1</sup>(x). Moreover, alloc−<sup>1</sup> <sup>y</sup> (x) allows to express in SL(∗, −∗, ls) whenever a location corresponding to a program variable reaches itself in exactly two steps (we use this property in the definition of <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y)). We write <sup>x</sup> →<sup>2</sup> <sup>y</sup> <sup>x</sup> to denote the formula <sup>¬</sup>(<sup>x</sup> <sup>→</sup> <sup>x</sup>) <sup>∧</sup> (<sup>x</sup> <sup>→</sup> <sup>y</sup> <sup>⇔</sup> <sup>y</sup> <sup>→</sup> <sup>x</sup>) <sup>∧</sup> [alloc(x) <sup>∧</sup> alloc−<sup>1</sup> <sup>y</sup> (x) <sup>∧</sup> ( −∗ ¬reach(x, <sup>y</sup>) = 2)]2. For any memory state (s, h) such that <sup>s</sup>(x) <sup>=</sup> <sup>s</sup>(y), we have (s, h) <sup>|</sup><sup>=</sup> <sup>x</sup> →<sup>2</sup> <sup>y</sup> x if and only if <sup>h</sup><sup>2</sup>(s(x)) = <sup>s</sup>(x) and <sup>h</sup>(s(x)) <sup>=</sup> <sup>s</sup>(x).

The predicate <sup>n</sup>(x) = <sup>n</sup>(y) can be defined in SL(∗, −∗, ls) as

$$\begin{aligned} (\mathbf{x} \neq \mathbf{y} \Rightarrow [\mathbf{a}\mathbf{1}\mathbf{1}\mathbf{c}\mathbf{c}(\mathbf{x}) \land \mathbf{a}\mathbf{1}\mathbf{1}\mathbf{c}(\mathbf{y}) \land ((\mathbf{x} \hookrightarrow \mathbf{y} \land \mathbf{y} \hookrightarrow \mathbf{y}) \lor (\mathbf{y} \hookrightarrow \mathbf{x} \land \mathbf{x} \hookrightarrow \mathbf{x})) \\ ((\bigwedge \neg(\mathbf{z} \hookrightarrow \mathbf{z}')) \land (\top \dashrightarrow \neg(\mathbf{reach}(\mathbf{x}, \mathbf{y}) = 2 \land \mathbf{reach}(\mathbf{y}, \mathbf{x}) = 2)))) \\ \mathbf{z}, \mathbf{z}' \in \{\mathbf{x}, \mathbf{y}\} \end{aligned}$$

**Lemma 4.** *Let* <sup>x</sup>, <sup>y</sup> <sup>∈</sup> PVAR*. For all memory states* (s, h)*, we have* (s, h) <sup>|</sup><sup>=</sup> n(x) = n(y) *iff* h(s(x)) = h(s(y))*.*

Similarly to alloc−<sup>1</sup>(x), we can show that <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) is definable in SL(∗, −∗, ls) by using one additional variable <sup>z</sup> whose value is different from both <sup>x</sup> and <sup>y</sup>. Let <sup>ϕ</sup>→(x, <sup>y</sup>, <sup>z</sup>) be (n(x) = <sup>n</sup>(y)∧ϕ<sup>=</sup> →(x, <sup>y</sup>, <sup>z</sup>))∨(n(x) <sup>=</sup> <sup>n</sup>(y)∧ϕ<sup>=</sup> →(x, <sup>y</sup>)) where ϕ<sup>=</sup> →(x, <sup>y</sup>, <sup>z</sup>) is defined as

$$\begin{aligned} & (\mathbf{x} \hookrightarrow \mathbf{x} \land \mathbf{y} \hookrightarrow \mathbf{x}) \lor (\mathbf{y} \hookrightarrow \mathbf{y} \land \mathbf{x} \hookrightarrow \mathbf{y}) \lor (\mathbf{x} \hookrightarrow \mathbf{z} \land \mathbf{z} \hookrightarrow \mathbf{z}) \\ & \lor [\mathbf{a} \mathbf{1} \mathbf{loc}(\mathbf{x}) \land \neg \mathbf{a} \mathbf{1} \mathbf{loc}\_{\mathbf{z}}^{-1}(\mathbf{x}) \land (\top \dashrightarrow \neg \mathbf{reach}(\mathbf{x}, \mathbf{z}) \le 3)]\_{2} \end{aligned}$$

whereas ϕ<sup>=</sup> →(x, <sup>y</sup>) is defined as

$$\begin{aligned} \left(\mathbf{x}\hookrightarrow\mathbf{y}\wedge\mathbf{a}\mathbf{1}\mathbf{loc}(\mathbf{y})\right)\vee\left(\mathbf{y}\hookrightarrow\mathbf{y}\wedge\mathbf{reach}(\mathbf{x},\mathbf{y})=2\right)\vee\left(\mathbf{y}\hookrightarrow\mathbf{x}\wedge\mathbf{x}\hookrightarrow^{2}\_{\mathbf{y}}\mathbf{x}\right)\vee\\ \left[\mathbf{a}\mathbf{1}\mathbf{loc}(\mathbf{x})\wedge\mathbf{a}\mathbf{1}\mathbf{loc}(\mathbf{y})\wedge\left(\bigwedge\_{\mathbf{z},\mathbf{z}'\in\{\mathbf{x},\mathbf{y}\}}\neg\mathbf{z}\hookrightarrow\mathbf{z}'\right)\wedge\neg\mathbf{reach}(\mathbf{x},\mathbf{y})\leq 3\right]\\ \wedge\left(\left(\mathbf{size\bar{e}}=1\land\mathbf{a}\mathbf{1}\mathbf{loc}\_{\mathbf{x}}^{-1}(\mathbf{y})\right)\dashrightarrow\left(\mathbf{reach}(\mathbf{x},\mathbf{y})=3\land\mathbf{y}\stackrel{\star}{\to}\_{\mathbf{x}}^{2}\mathbf{y}\right))\right]\_{3} \end{aligned}$$

**Lemma 5.** *Let* <sup>x</sup>, <sup>y</sup>, <sup>z</sup> <sup>∈</sup> PVAR*. (I) For all memory states* (s, h) *such that* <sup>s</sup>(x) <sup>=</sup> <sup>s</sup>(z) *and* <sup>s</sup>(y) <sup>=</sup> <sup>s</sup>(z)*, we have* (s, h) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>→(x, <sup>y</sup>, <sup>z</sup>) *iff* {s(x), s(y)} ⊆ dom(h) *and* <sup>h</sup>(h(s(x))) = <sup>h</sup>(s(y))*; (II) In the translation,* <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) *can be replaced by* <sup>ϕ</sup>→(x, <sup>y</sup>, <sup>x</sup>)*.*

As for alloc−<sup>1</sup> <sup>y</sup> (x), the properties of the translation imply the equivalence between <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) and <sup>ϕ</sup>→(x, <sup>y</sup>, <sup>x</sup>) (as stated in Lemma 5(II)). By looking at the formulae herein defined, the predicate reach only appears bounded, i.e. in the form of reach(x, y) = 2 and reach(x, y) = 3. The three new predicates can therefore be defined in SL(∗, −∗) enriched with reach(x, <sup>y</sup>) = 2 and reach(x, y) = 3.

#### **3.4 Undecidability Results and Non-finite Axiomatization**

It is time to collect the fruits of all our efforts and to conclude this part about undecidability. As a direct consequence of Corollary 1 and the undecidability of SL(∀, −∗), here is one of the main results of the paper.

**Theorem 1.** *The satisfiability problem for* SL(∗, −∗, ls) *is undecidable.*

As a by-product, the set of valid formulae for SL(∗, −∗, ls) is not recursively enumerable. Indeed, suppose that the set of valid formulae for SL(∗, −∗, ls) were r.e., then one can enumerate the valid formulae of the form TVAL(ϕ) as it is decidable in PTime whether <sup>ψ</sup> in SL(∗, −∗, ls) is syntactically equal to <sup>T</sup>VAL(ϕ) for some SL(∀, −∗) formula ϕ. This leads to a contradiction since this would allow the enumeration of valid formulae in SL(∀, −∗).

The essential ingredients to establish the undecidability of SL(∗, −∗, ls) are the fact that the following properties <sup>n</sup>(x) = <sup>n</sup>(y), <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) and alloc−<sup>1</sup>(x) are expressible in the logic.

**Corollary 2.** SL(∗, −∗) *augmented with built-in formulae of the form* <sup>n</sup>(x) = <sup>n</sup>(y)*,* <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) *and* alloc−<sup>1</sup>(x) *(resp. of the form* reach(x, <sup>y</sup>)=2 *and* reach(x, y)=3*) admits an undecidable satisfiability problem.*

This is the addition of reach(x, y) = 3 that is crucial for undecidability since the satisfiability problem for SL(∗, −∗, reach(x, <sup>y</sup>) = 2) is in PSpace [15]. Following a similar analysis, let SL1(∀, ∗, −∗) be the restriction of SL(∀, ∗, −∗) (i.e. SL(∀, −∗) plus <sup>∗</sup>) to formulae of the form <sup>∃</sup>x<sup>1</sup> ··· ∃x<sup>q</sup> <sup>ϕ</sup>, where <sup>q</sup> <sup>≥</sup> 1, the variables in <sup>ϕ</sup> are among {x1,..., <sup>x</sup>q+1} and the only quantified variable in <sup>ϕ</sup> is <sup>x</sup>q+1. The satisfiability problem for SL1(∀, ∗, −∗) is PSpace-complete [15]. Note that SL1(∀, ∗, −∗) can easily express n(x) = n(y) and alloc−<sup>1</sup>(x). The distance between the decidability for SL1(∀, <sup>∗</sup>, −∗) and the undecidability for SL(∗, −∗, ls), is best witnessed by the corollary below, which solves an open problem [15, Sect. 6].

**Corollary 3.** *SL1(*∀, <sup>∗</sup>, −∗*) augmented with* <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) *(resp. SL1(*∀, <sup>∗</sup>, −∗*) augmented with* ls*) admits an undecidable satisfiability problem.*

# **4 SL(***∗,* reach**+) and Other PSPACE Variants**

As already seen in Sect. 2, SL(∗, ls) can be understood as a fragment of SL(∗, reach+). Below, we show that the satisfiability problem for SL(∗, reach+) can be solved in polynomial space. Refining the arguments used in our proof, we also show the decidability of the fragment of SL(∗, −∗, reach+) where reach<sup>+</sup> is constrained not to occur in the scope of −∗, i.e. ϕ belongs to that fragment iff for any subformula <sup>ψ</sup> of <sup>ϕ</sup> of the form <sup>ψ</sup><sup>1</sup> −∗ <sup>ψ</sup>2, reach<sup>+</sup> does not occur in <sup>ψ</sup><sup>1</sup> and in ψ2.

The proof relies on a small heap property: a formula ϕ is satisfiable if and only if it admits a model with a polynomial amount of memory cells. The PSpace upper bound then follows by establishing that the model-checking problem for SL(∗, reach+) is in PSpace too. To establish the small heap property, an equivalence relation on memory states with finite index is designed, following the standard approach in [10,32] and using test formulae as in [4,15,22,23].

#### **4.1 Introduction to Test Formulae**

Before presenting the test formulae for SL(∗, reach+), let us recall the standard result for SL(∗, −∗) (that will be also used at some point later on).

**Proposition 2.** *[22,32] Any formula* <sup>ϕ</sup> *in* SL(∗, −∗) *built over variables in* <sup>x</sup>1*, ...,*x<sup>q</sup> *is logically equivalent to a Boolean combination of formulae among* x<sup>i</sup> =x<sup>j</sup> *,* alloc(xi)*,* <sup>x</sup><sup>i</sup> <sup>→</sup> <sup>x</sup><sup>j</sup> *and* size <sup>≥</sup> <sup>β</sup> *(*i, j ∈ {1,...,q}*,* <sup>β</sup> <sup>∈</sup> <sup>N</sup>*).*

By way of example, <sup>¬</sup>emp <sup>∗</sup> (x<sup>1</sup> <sup>→</sup> <sup>x</sup>2)−∗ ⊥ is equivalent to size <sup>≥</sup> <sup>2</sup> <sup>∧</sup> alloc(x1). As a corollary of the proof of Proposition 2, in size <sup>≥</sup> <sup>β</sup> we can enforce that β ≤ 2 × |ϕ| (rough upper bound) where |ϕ| is the size of ϕ. Similar results will be shown for SL(∗, reach+) and for some of its extensions.

In order to define a set of test formulae that captures the expressive power of SL(∗, reach+), we need to study which basic properties on memory states can be expressed by SL(∗, reach+) formulae. For example, consider the memory states from Fig. 1.

The fragment memory states (s1, h1) and (s2, h2) can be distinguished by the formula ∗ (reach(xi, <sup>x</sup><sup>j</sup> ) <sup>∧</sup> reach(x<sup>j</sup> , <sup>x</sup>k) ∧ ¬reach(xk, <sup>x</sup>i)). Indeed, (s1, h1) satisfies this formula by considering a subheap that does not contain a path from s(xk) to s(xi), whereas it is impossible to find a subheap for (s2, h2) that retains the path from s(xi) to s(x<sup>j</sup> ), the one from s(x<sup>j</sup> ) to s(xk) but where the path from <sup>s</sup>(xk) to <sup>s</sup>(xi) is lost. This suggests that SL(∗, reach+) can express

**Fig. 1.** Memory states (*s*1*, h*1), . . . , (*s*4*, h*4) (from left to right)

whether, for example, any path from s(xi) to s(x<sup>j</sup> ) also contains s(xk). We will introduce the test formula seesq(xi, <sup>x</sup><sup>j</sup> ) <sup>≥</sup> <sup>β</sup> to capture this property.

Similarly, the memory states (s3, h3) and (s4, h4) can be distinguished by the formula (size = 1) <sup>∗</sup> reach(x<sup>j</sup> , <sup>x</sup>k) ∧ ¬reach(xi, <sup>x</sup>k) ∧ ¬reach<sup>+</sup>(xk, <sup>x</sup>k) . The memory state (s3, h3) satisfies this formula by separating {- → - } from the rest of the heap, whereas the formula is not satisfied by (s4, h4). Indeed, there is no way to break the loop from s(xk) to itself by removing just one location from the heap while retaining the path from s(x<sup>j</sup> ) to s(xk) and loosing the path from s(xi) to s(xk). This suggests that the two locations and are particularly interesting since they are reachable from several locations corresponding to program variables. Therefore by separating them from the rest of the heap, several paths are lost. In order to capture this, we introduce the notion of *meet-points*.

Let Terms<sup>q</sup> be the set {x1,..., <sup>x</sup>q}∪{mq(xi, <sup>x</sup><sup>j</sup> ) <sup>|</sup> i, j <sup>∈</sup> [1, q]} understood as the set of *terms* that are either variables or expressions denoting a meet-point. We write [[xi]]<sup>q</sup> s,h to denote <sup>s</sup>(xi) and [[mq(xi, <sup>x</sup><sup>j</sup> )]]<sup>q</sup> s,h to denote (if it exists) the first location reachable from s(xi) that is also reachable from s(x<sup>j</sup> ). Moreover we require that this location can reach another location corresponding to a program variable. Formally, [[mq(xi, x<sup>j</sup> )]]<sup>q</sup> s,h is defined as the unique location such that


These conditions hold for at most one location -. One can easily show that the notion [[mq(xi, x<sup>j</sup> )]]<sup>q</sup> s,h is well-defined. The picture below provides a taxonomy of meet-points, where arrows labelled by '+' represent paths of non-zero length and zig-zag arrows any path (possibly of zero length). Symmetrical cases, obtained by swapping x<sup>i</sup> and x<sup>j</sup> , are omitted.

Notice how the asymmetrical definition of meet-points is captured in the two rightmost heaps. Consider the memory states from Fig. 1, (s3, h3) and (s4, h4) can be seen as an instance of the third case of the taxonomy and, as such, it holds that [[mq(xi, x<sup>j</sup> )]]<sup>q</sup> <sup>s</sup>3,h<sup>3</sup> = and [[mq(x<sup>j</sup> , xi)]]<sup>q</sup> <sup>s</sup>3,h<sup>3</sup> = - .

Given q, α ≥ 1, we write Test(q, α) to denote the following set of atomic formulae (also called *test formulae*):

$$v = v' \quad v \hookrightarrow v' \quad \texttt{a1} \mathsf{Loc}(v) \quad \texttt{seees}\_q(v, v') \ge \beta + 1 \quad \texttt{size} \mathsf{R}\_q \ge \beta,$$

where v, v <sup>∈</sup> Terms<sup>q</sup> and <sup>β</sup> <sup>∈</sup> [1, α]. It is worth noting that the alloc(v)'s are not needed for the logic SL(∗, reach+) but it is required for extensions.

We identify as special locations the s(xi)'s and the meet-points of the form [[mq(xi, x<sup>j</sup> )]]<sup>q</sup> s,h when it exists (i, j ∈ [1, q]). We call such locations, *labelled* locations, and the set of labelled locations is written Labels<sup>q</sup> s,h. The formal semantics of the test formulae is provided below:

$$\begin{array}{lll} (s,h) \vdash v = v' & \iff & [v]\_{s,h}^{q},[v']\_{s,h}^{q} \text{ are defined}, [v]\_{s,h}^{q} = [v']\_{s,h}^{q} \\ (s,h) \vdash \mathtt{a}\mathtt{1}\mathtt{oc}(v) & \iff & [v]\_{s,h}^{q} \text{ is defined and belongs to } \mathtt{dom}(h) \\ (s,h) \vdash v \hookrightarrow v' & \iff & h(\lceil v \rceil\_{s,h}^{q}) = [v']\_{s,h}^{q} \\ (s,h) \vdash \mathtt{s}\mathtt{e}\mathtt{es}\_{q}(v,v') \ge \beta + 1 \iff & \exists L \ge \beta + 1, \ h^{L}(\lceil v \rceil\_{s,h}^{q}) = [v']\_{s,h}^{q} \text{ and } \\ & & \forall \ 0 < L' < L, \ h^{L'}(\lceil v \rceil\_{s,h}^{q}) \notin \mathtt{Label}\_{s,h}^{q} \\ (s,h) \vdash \mathtt{s}\mathtt{izeR}\_{q} \ge \beta & \iff & \mathtt{card}(\mathtt{Rem}\_{s,h}^{q}) \ge \beta \end{array}$$

where Rem<sup>q</sup> s,h is the set of locations that neither belong to a path between two locations interpreted by program variables nor are equal to program variable interpretations, i.e. Rem<sup>q</sup> s,h def = {- <sup>∈</sup> dom(h) | ∀<sup>i</sup> <sup>∈</sup> [1, q], s(xi) <sup>=</sup> - and <sup>∀</sup><sup>j</sup> <sup>∈</sup> [1, q] -L, L <sup>≥</sup> <sup>1</sup>, h<sup>L</sup>(s(xi)) = and h<sup>L</sup>- (-) = <sup>s</sup>(x<sup>j</sup> )}. There is no need for test formulae of the form seesq(v, v ) ≥ 1 since they are equivalent to v → <sup>v</sup> <sup>∨</sup>seesq(v, v ) <sup>≥</sup> 2. One can check whether [[mq(xi, <sup>x</sup><sup>j</sup> )]]<sup>q</sup> s,h is defined thanks to the formula <sup>m</sup>q(xi, <sup>x</sup><sup>j</sup> ) = <sup>m</sup>q(xi, <sup>x</sup><sup>j</sup> ). By contrast, sizeR<sup>q</sup> <sup>≥</sup> <sup>β</sup> states that the cardinality of the set Rem<sup>q</sup> s,h is at least <sup>β</sup>. Furthermore, seesq(v, v ) ≥ β + 1 states that there is a minimal path between v and v of length at least β + 1 and strictly between v and v , there are no labelled locations. The satisfaction of seesq(v, v ) ≥ β + 1 entails the exclusion of labelled locations in the witness path, which is reminiscent to <sup>T</sup> <sup>h</sup>\T-- −−→ T in the logic GRASS [26]. So, the test formulae are quite expressive since they capture the atomic formulae from SL(∗, reach+) and the test formulae for SL(∗, −∗).

**Lemma 6.** *Given* α, q <sup>≥</sup> <sup>1</sup>*,* i, j <sup>∈</sup> [1, q]*, for any atomic formula among* ls(xi, <sup>x</sup><sup>j</sup> )*,* reach(xi, <sup>x</sup><sup>j</sup> )*,* reach+(xi, <sup>x</sup><sup>j</sup> )*,* emp *and* size <sup>≥</sup> <sup>β</sup> *with* <sup>β</sup> <sup>≤</sup> <sup>α</sup>*, there is a Boolean combination of test formulae from Test*(q, α) *logically equivalent to it.*

#### **4.2 Expressive Power and Small Model Property**

The sets of test formulae Test(q, α) are sufficient to capture the expressive power of SL(∗, reach+) (as shown below, Theorem 2) and deduce the small heap property of this logic (Theorem 3). We introduce an indistinguishability relation between memory states based on test formulae, see analogous relations in [13,15,22].

**Definition 3.** *Given* q, α <sup>≥</sup> <sup>1</sup>*, we write* (s, h) <sup>≈</sup><sup>q</sup> <sup>α</sup> (s , h ) *def* ⇔ *for all* ψ ∈ *Test*(q, α)*, we have* (s, h) |= ψ *iff* (s , h ) |= ψ*.*

Theorem 2(I) states that if (s, h) <sup>≈</sup><sup>q</sup> <sup>α</sup> (s , h ), then the two memory states cannot be distinguished by formulae whose syntactic resources are bounded in some way by q and α (details will follow, see the definition for msize(ϕ)).

Below, we state the key intermediate result of the section that can be viewed as a distributivity lemma. The expressive power of the test formulae allows us to mimic the separation between two equivalent memory states with respect to the relation <sup>≈</sup><sup>q</sup> <sup>α</sup>, which is essential in the proof of Theorem 2(I).

**Lemma 7.** *Let* q, α, α1, α<sup>2</sup> <sup>≥</sup> <sup>1</sup> *with* <sup>α</sup> <sup>=</sup> <sup>α</sup><sup>1</sup> <sup>+</sup>α<sup>2</sup> *and* (s, h)*,* (s , h ) *be such that* (s, h) <sup>≈</sup><sup>q</sup> <sup>α</sup> (s , h )*. For all heaps* h1*,* h<sup>2</sup> *such that* h = h<sup>1</sup> + h<sup>2</sup> *there are heaps* h 1*,* h <sup>2</sup> *such that* h = h <sup>1</sup> + h <sup>2</sup>*,* (s, h1) <sup>≈</sup><sup>q</sup> <sup>α</sup><sup>1</sup> (s , h <sup>1</sup>) *and* (s, h2) <sup>≈</sup><sup>q</sup> <sup>α</sup><sup>2</sup> (s , h 2)*.*

For each formula <sup>ϕ</sup> in SL(∗, reach+), we define its *memory size* msize(ϕ) following the clauses below (see also [32]).

$$\begin{array}{l} \mathsf{msize}(\pi) \stackrel{\mathsf{def}}{=} 1 \quad \mathsf{for any atomic formula } \pi \\ \mathsf{msize}(\psi \* \psi') \stackrel{\mathsf{def}}{=} \mathsf{msize}(\psi) + \mathsf{msize}(\psi') \\ \mathsf{msize}(\psi \wedge \psi') \stackrel{\mathsf{def}}{=} \max(\mathsf{msize}(\psi), \mathsf{msize}(\psi')) \\ \mathsf{msize}(\neg \psi) \stackrel{\mathsf{def}}{=} \mathsf{msize}(\psi). \end{array}$$

We have 1 <sup>≤</sup> msize(ϕ) ≤ |ϕ|. Theorem <sup>2</sup> below establishes the properties that formulae in SL(∗, reach+) can express.

**Theorem 2.** *Let* <sup>ϕ</sup> *be in* SL(∗, reach+) *built over the variables in* <sup>x</sup>1*, ...,* <sup>x</sup>q*. (I) For all* α ≥ 1 *such that msize*(ϕ) ≤ α *and for all memory states* (s, h)*,* (s , h ) *such that* (s, h) <sup>≈</sup><sup>q</sup> <sup>α</sup> (s , h )*, we have* (s, h) |= ϕ *iff* (s , h ) |= ϕ*. (II)* ϕ *is logically equivalent to a Boolean combination of test formulae from Test*(q, *msize*(ϕ))*.*

The proof of Theorem 2(I) is by structural induction on ϕ. The basic cases for atomic formulae follow from Lemma 6 whereas the inductive cases for Boolean connectives are immediate. For the separating conjunction, suppose (s, h) |= <sup>ϕ</sup><sup>1</sup> <sup>∗</sup>ϕ<sup>2</sup> and msize(ϕ<sup>1</sup> <sup>∗</sup>ϕ2) <sup>≤</sup> <sup>α</sup>. There are heaps <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> such that <sup>h</sup> <sup>=</sup> <sup>h</sup>1+h2, (s, h1) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> and (s, h2) <sup>|</sup><sup>=</sup> <sup>ψ</sup>2. As <sup>α</sup> <sup>≥</sup> msize(ψ<sup>1</sup> <sup>∗</sup>ψ2) = msize(ψ1) +msize(ψ2), there exist <sup>α</sup><sup>1</sup> and <sup>α</sup><sup>2</sup> such that <sup>α</sup> <sup>=</sup> <sup>α</sup><sup>1</sup> <sup>+</sup> <sup>α</sup>2, <sup>α</sup><sup>1</sup> <sup>≥</sup> msize(ψ1) and <sup>α</sup><sup>2</sup> <sup>≥</sup> msize(ψ2). By Lemma 7, there exist heaps h <sup>1</sup> and h <sup>2</sup> such that h = h <sup>1</sup> + h 2, (s, h1) <sup>≈</sup><sup>q</sup> <sup>α</sup><sup>1</sup> (s , h <sup>1</sup>) and (s, h2) <sup>≈</sup><sup>q</sup> <sup>α</sup><sup>2</sup> (s , h <sup>2</sup>). By the induction hypothesis, we get (s , h <sup>1</sup>) |= ψ<sup>1</sup> and (s , h <sup>2</sup>) |= ψ2. Consequently, we obtain (s , h ) |= ψ<sup>1</sup> ∗ ψ2.

As an example, we can apply this result to the memory states from Fig. 1. We have already shown how we can distinguish (s1, h1) from (s2, h2) using a formula with only one separating conjunction. Theorem 2 ensures that these two memory states do not satisfy the same set of test formulae for α ≥ 2. Indeed, only (s1, h1) satisfies seesq(xi, <sup>x</sup><sup>j</sup> ) <sup>≥</sup> 2. The same argument can be used with (s3, h3) and (s4, h4): only (s3, h3) satisfies the test formula <sup>m</sup>q(xi, <sup>x</sup><sup>j</sup> ) <sup>→</sup> <sup>m</sup>q(x<sup>j</sup> , <sup>x</sup>i). Clearly, Theorem 2(II) relates separation logic with classical logic as advocated also in the works [10,23]. Now, it is possible to establish a small heap property.

**Theorem 3.** *Let* <sup>ϕ</sup> *be a satisfiable* SL(∗, reach+) *formula built over* <sup>x</sup>1*, ...,* <sup>x</sup>q*. There is* (s, h) *such that* (s, h) <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *and* card(dom(h)) <sup>≤</sup> (q<sup>2</sup> <sup>+</sup> <sup>q</sup>)·(|ϕ<sup>|</sup> + 1) + <sup>|</sup>ϕ|*.*

The small heap property for SL(∗, reach+) is inherited from the small heap property for the Boolean combinations of test formulae, which is analogous to the small model property for other theories of singly linked lists, see e.g. [13,27].

#### **4.3 Complexity Upper Bounds**

Let us draw some consequences of Theorem 3. First, for the logic SL(∗, reach+), we get a PSpace upper, which matches the lower bound for SL(∗) [11].

### **Theorem 4.** *The satisfiability problem for* SL(∗, reach+) *is* PSpace*-complete.*

Besides, we may consider restricting the usage of Boolean connectives. We note Bool(SHF) for the Boolean combinations of formulae from the symbolic heap fragment [2]. A PTime upper bound for the entailment/satisfiability problem for the symbolic heap fragment is successfully solved in [12,17], whereas the satisfiability problem for a slight variant of Bool(SHF) is shown in NP in [26, Theorem 4]. Theorem 3 allows us to conclude this NP upper bound result as a by-product (we conjecture that our quadratic upper bound on the number of cells could be improved to a linear one in that case).

#### **Corollary 4.** *The satisfiability problem for* Bool(SHF) *is* NP*-complete.*

It is possible to push further the PSpace upper bound by allowing occurrences of −∗ in a controlled way. Let SL(∗, reach<sup>+</sup>, q,α Test(q, α)) be the extension of SL(∗, reach+) augmented with the test formulae. The memory size function is also extended: msize(v <sup>→</sup> <sup>v</sup> ) def = 1, msize(seesq(v, v ) <sup>≥</sup> <sup>β</sup> + 1) def = β + 1, msize(sizeR <sup>≥</sup> <sup>β</sup>) def = β and msize(alloc(v)) def = 1. When formulae are encoded as trees, we have 1 <sup>≤</sup> msize(ϕ) ≤ |ϕ|α<sup>ϕ</sup> where <sup>α</sup><sup>ϕ</sup> is the maximal constant in <sup>ϕ</sup>. Theorem 2(I) admits a counterpart for SL(∗, reach<sup>+</sup>, q,α Test(q, α)) and consequently, any formula built over x1, ..., x<sup>q</sup> can be shown equivalent to a Boolean combination of test formulae from Test(q, |ϕ|αϕ). By Theorem 3, any satisfiable formula has therefore a model with card(dom(h)) <sup>≤</sup> (q<sup>2</sup> <sup>+</sup>q)·(|ϕ|α<sup>ϕ</sup> <sup>+</sup> 1) + <sup>|</sup>ϕ|αϕ. Hence, the satisfiability problem for SL(∗, reach<sup>+</sup>, q,α Test(q, α)) is in PSpace when the constants are encoded in unary. Now, we can state the new PSpace upper bound for Boolean combinations of formulae from SL(∗, −∗) <sup>∪</sup> SL(∗, reach+).

**Theorem 5.** *The satisfiability problem for Boolean combinations of formulae from* SL(∗, −∗) <sup>∪</sup> SL(∗, reach+) *is* PSpace*-complete.*

To conclude, let us introduce the largest fragment including −∗ and ls for which decidability can be established so far.

**Theorem 6.** *The satisfiability problem for the fragment of* SL(∗, −∗, reach+) *in which* reach<sup>+</sup> *is not in the scope of* −∗ *is decidable.*

### **5 Conclusion**

We studied the effects of adding ls to SL(∗, −∗) and variants. SL(∗, −∗, ls) is shown undecidable (Theorem 1) and non-finitely axiomatisable, which remains quite unexpected since there are no first-order quantifications. This result is strengthened to even weaker extensions of SL(∗, −∗) such as the one augmented with <sup>n</sup>(x) = <sup>n</sup>(y), <sup>n</sup>(x) <sup>→</sup> <sup>n</sup>(y) and alloc−<sup>1</sup>(x), or the one augmented with reach(x, y) = 2 and reach(x, y) = 3. If the magic wand is discarded, we have established that the satisfiability problem for SL(∗, ls) is PSpace-complete by introducing a class of test formulae that captures the expressive power of SL(∗, ls) and that leads to a small heap property. Such a logic contains the Boolean combinations of symbolic heaps and our proof technique allows us to get an NP upper bound for such formulae. Moreover, we show that the satisfiability problem for SL(∗, −∗, reach+) restricted to formulae in which reach<sup>+</sup> is not in the scope of −∗ is decidable, leading to the largest known decidable fragment for which −∗ and reach<sup>+</sup> (or ls) cohabit. So, we have provided proof techniques to establish undecidability when <sup>∗</sup>, −∗ and ls are present and to establish decidability based on test formulae. This paves the way to investigate the decidability status of SL(−∗, ls) as well as of the positive fragment of SL(∗, −-, ls) from [30,31].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Equational Theory of the Natural Join and Inner Union is Decidable**

Luigi Santocanale(B)

LIS, CNRS UMR 7020, Aix-Marseille Universit´e, Marseille, France luigi.santocanale@lis-lab.fr

**Abstract.** The natural join and the inner union operations combine relations of a database. Tropashko and Spight [25] realized that these two operations are the meet and join operations in a class of lattices, known by now as the relational lattices. They proposed then lattice theory as an algebraic approach to the theory of databases, alternative to the relational algebra.

Previous works [17,23] proved that the quasiequational theory of these lattices—that is, the set of definite Horn sentences valid in all the relational lattices—is undecidable, even when the signature is restricted to the pure lattice signature.

We prove here that the equational theory of relational lattices is decidable. That, is we provide an algorithm to decide if two lattice theoretic terms *t, s* are made equal under all interpretations in some relational lattice. We achieve this goal by showing that if an inclusion *t* ≤ *s* fails in any of these lattices, then it fails in a relational lattice whose size is bound by a triple exponential function of the sizes of *t* and *s*.

### **1 Introduction**

The natural join and the inner union operations combine relations (i.e. tables) of a database. SQL-like languages construct queries by making repeated use of the natural join and of the union. The inner union is a mathematically well behaved variant of the union—for example, it does not introduce empty cells. Tropashko and Spight realized [25,26] that these two operations are the meet and join operations in a class of lattices, known by now as the class of relational lattices. They proposed then lattice theory as an algebraic approach, alternative to Codd's relational algebra [4], to the theory of databases.

Roughly speaking, elements of the relational lattice R(D, A) are tables of a database, where A is a set of columns' names and D is the set of possible cells' values. Let us illustrate the two operations with examples. The natural join takes two tables and constructs a new one whose columns are indexed by the union of the headers, and whose rows are glueings of the rows along identical values in common columns:

Supported by the Project TICAMORE ANR-16-CE91-0002-01.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 494–510, 2018. https://doi.org/10.1007/978-3-319-89366-2\_27

The inner union restricts two tables to the common columns and lists all the rows of the two tables. The following example suggests how to construct, using this operation, a table of users given two (or more) tables of people having different roles.

Since we shall focus on lattice-theoretic considerations, we shall use the symbols ∧ and ∨, in place of the symbols for ∪ used by database theorists.

A first important attempt to axiomatize these lattices was done by Litak et al. [17]. They proposed an axiomatization, comprising equations and quasiequations, in a signature that extends the pure lattice signature with a constant, the header constant. A main result of that paper is that the quasiequational theory of relational lattices is undecidable in this extended signature. Their proof mimics Maddux's proof that the equational theory of cylindric algebras of dimension n ≥ 3 is undecidable [18].

Their result was further refined by us in [23]: the quasiequational theory of relational lattices is undecidable even when the signature considered is the least one, comprising only the meet (natural join) and the join operations (inner union). Our proof relied on a deeper algebraic insight: we proved that it is undecidable whether a finite subdirectly irreducible lattice can be embedded into a relational lattice—from this kind of result, undecidability of the quasiequational theory immediately follows. We proved the above statement by reducing to it an undecidable problem in modal logic, the coverability problem of a frame by a universal **S5**<sup>3</sup>-product frame [12]. In turn, this problem was shown to be undecidable by reducing it to the representability problem of finite simple relation algebras [11].

We prove here that the equational theory of relational lattices is decidable. That is, we prove that it is decidable whether two lattice terms t and s are such that t<sup>v</sup> = <sup>s</sup>v, for any valuation <sup>v</sup> : <sup>X</sup> −→ <sup>R</sup>(D, A) of variables in a relational lattice R(D, A). We achieve this goal by showing that this theory has a kind of finite model property of bounded size. Out main result, Theorem 25, sounds as follows: *if an inclusion* <sup>t</sup> <sup>≤</sup> <sup>s</sup> *fails in a relational lattice* <sup>R</sup>(D, A), *then such inclusion fails in a finite lattice* R(E,B), *such that* B *is bound by an exponential function in the size of* t *and* s, *and* E *is linear in the size of* t. It follows that the size of R(E,B) can be bound by a triple exponential function in the size of t and s. In algebraic terms, our finite model theorem can be stated by saying that the variety generated by the relational lattices is actually generated by its finite generators, the relational lattices that are finite.

In our opinion, our results are significant in two respects. Firstly, the algebra of the natural join and of the inner union has a direct connection to the widespread SQL-like languages, see e.g. [17]. We dare to say that most of programmers that use a database—more or less explicitly, for example within serverside web programs—are using these operations. In view of the widespread use of these languages, the decidability status of this algebraic system deserved being settled. Moreover, we believe that the mathematical insights contained in our decidability proof shall contribute to understand further the algebraic system. For example, it is not known yet whether a complete finite axiomatic basis exists for relational lattices; finding it could eventually yield applications, e.g. on the side of automated optimization of queries.

Secondly, our work exhibits the equational theory of relational lattices as a decidable one within a long list of undecidable logical theories [11,12,17,18,23] that are used to model the constructions of relational algebra. We are exploring limits of decidability, a research direction widely explored in automata theoretic settings starting from [3]. We do this, within logic and with plenty of potential applications, coming from the undecidable side and crossing the border: after the quasiequational theory, undecidable, the next natural theory on the list, the equational theory of relational lattices, is decidable.

On the technical side, our work relies on [22] where the duality theory for finite lattices developed in [21] was used to investigate equational axiomatizations of relational lattices. A key insight from [22] is that relational lattices are, in some sense, duals of generalized ultrametric spaces over a powerset algebra. It is this perspective that made it possible to uncover the strong similarity between the lattice-theoretic methods and tools from modal logic—in particular the theory of combination of modal logics, see e.g. [15]. We exploit here this similarity to adapt filtrations techniques from modal logic [8] to lattice theory. Also, the notion of generalized ultrametric spaces over a powerset algebra and the characterization of injective objects in the category of these spaces have been fundamental tools to prove the undecidability of the quasiequational theory [23] as well as, in the present case, the decidability of the equational theory.

The paper is organised as follows. We recall in Sect. 2 some definitions and facts about lattices. The relational lattices R(D, A) are introduced in Sect. 3. In Sect. 4 we show how to construct a lattice L(X, δ) from a generalized ultrametric space (X, δ). This construction generalizes the construction of the lattice R(D, A): if X = D<sup>A</sup> is the set of all functions from A to D and δ is as a sort of Hamming distance, then L(X, δ) = R(D, A). We use the functorial properties of L to argue that when a finite space (X, δ) has the property of being pairwisecomplete, then L(X, δ) belongs to the variety generated by the relational lattices. In Sect. <sup>5</sup> we show that if an inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in a lattice <sup>R</sup>(D, A), then we can construct a finite subset <sup>T</sup>(f, t) <sup>⊆</sup> <sup>D</sup><sup>A</sup>, a "tableau" witnessing the failure, such that if T(f, t) ⊆ T and T is finite, then t ≤ s fails in a finite lattice of the form L(T,δB), where the distance δ<sup>B</sup> takes values in a finite powerset algebra P(B). In Sect. 6, we show how to extend T(f, t) to a finite bigger set G, so that (G, δB) as a space over the powerset algebra P(B) is pairwise-complete. This lattice <sup>L</sup>(G, δB) fails the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup>; out of it, we build a lattice of the form R(E,B), which fails the same inclusion; the sizes of E and B can be bound by functions of the sizes of the terms t and s. Perspectives for future research directions appear in the last Sect. 7.

### **2 Elementary Notions on Orders and Lattices**

We assume some basic knowledge of order and lattice theory as presented in standard monographs [5,9]. Most of the lattice theoretic tools we use originate from the monograph [7].

A *lattice* is a poset L such that every finite non-empty subset X ⊆ L admits a smallest upper bound - X and a greatest lower bound X. A lattice can also be understood as a structure A for the functional signature (∨,∧), such that the interpretations of these two binary function symbols both give A the structure of an idempotent commutative semigroup, the two semigroup structures being connected by the absorption laws x ∧ (y ∨ x) = x and x ∨ (y ∧ x) = x. Once a lattice is presented as such structure, the order is recovered by stating that x ≤ y holds if and only if x ∧ y = x.

A lattice L is *complete* if any subset X ⊆ L admits a smallest upper bound - X. It can be shown that this condition implies that any subset X ⊆ L admits a greatest lower bound X. A lattice is *bounded* if it has a least element ⊥ and a greatest element . A complete lattice (in particular, a finite lattice) is bounded, since - ∅ and ∅ are, respectively, the least and greatest elements of the lattice.

If P and Q are partially ordered sets, then a function f : P −→ Q is *orderpreserving* (or *monotone*) if p ≤ p implies f(p) ≤ f(p ). If L and M are lattices, then a function f : L −→ M is a *lattice morphism* if it preserves the lattice operations ∨ and ∧. A lattice morphism is always order-preserving. A lattice morphism f : L −→ M between bounded lattices L and M is *bound-preserving* if f(⊥) = ⊥ and f( ) = . A function f : P −→ Q is said to be *left adjoint* to an order-preserving g : Q −→ P if f(p) ≤ q holds if and only if p ≤ g(q) holds, for every p ∈ P and q ∈ Q; such a left adjoint, when it exists, is unique. Dually, a function g : Q −→ P is said to be *right adjoint* to an order-preserving f : P −→ Q if f(p) ≤ q holds if and only if p ≤ g(q) holds; clearly, f is left adjoint to g if and only if g is right adjoint to f, so we say that f and g form an adjoint pair. If P and Q are complete lattices, the property of being a left adjoint (resp., right adjoint) to some g (resp., to some f) is equivalent to preserving all (possibly infinite) joins (resp., all meets).

A *Moore family on* P(U) is a collection F of subsets of U which is closed under arbitrary intersections. Given a Moore family F on P(U), the correspondence sending Z ⊆ U to Z := { Y ∈F| Z ⊆ Y } is a *closure operator* on P(U), that is, an order-preserving inflationary and idempotent endofunction of P(U). The subsets in F, called the *closed sets*, are exactly the fixpoints of this closure operator. A Moore family F has the structure of a complete lattice where

$$\bigwedge X := \bigcap X \,, \qquad \qquad \bigvee X := \overline{\bigcup X} \,. \tag{1}$$

The notion of Moore family can also be defined for an arbitrary complete lattice L. Moore families on L turns out to be in bijection with closure operators on L. We shall actually consider the dual notion: a *dual Moore family on a complete lattice* L is a subset F ⊆ L that is closed under arbitrary joins. Such an F determines an interior operator (an order-preserving decreasing and idempotent endofunction on L) by the formula x◦ = -{ y ∈F| y ≤ x } and has the structure of a complete lattice, where - <sup>F</sup> <sup>X</sup> := - <sup>L</sup> X and <sup>F</sup> <sup>X</sup> := ( <sup>L</sup> X) ◦ . Dual Moore families on L are in bijection with interior operators on L. Finally, let us mention that closure (resp., interior) operators arise from adjoint pairs f and g (with f left adjoint to g) by the formula x = g(f(x)) (resp., x◦ = f(g(x)));

### **3 The Relational Lattices R(***D, A***)**

Throughout this paper we use the Y <sup>X</sup> for the set of functions of domain Y and codomain X.

Let A be a collection of attributes (or column names) and let D be a set of cell values. A *relation* on <sup>A</sup> and <sup>D</sup> is a pair (α, T) where <sup>α</sup> <sup>⊆</sup> <sup>A</sup> and <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>α</sup>. Elements of the relational lattice<sup>1</sup> R(D, A) are relations on A and D. Informally, a relation (α, T) represents a table of a relational database, with α being the header, i.e. the collection of names of columns, while T is the collection of rows.

Before we define the natural join, the inner union operations, and the order on <sup>R</sup>(D, A), let us recall some key operations. If <sup>α</sup> <sup>⊆</sup> <sup>β</sup> <sup>⊆</sup> <sup>A</sup> and <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>β</sup>, then we shall use f<sup>α</sup> <sup>∈</sup> <sup>D</sup><sup>α</sup> for the restriction of <sup>f</sup> to <sup>α</sup>; if <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>β</sup>, then <sup>T</sup> -α shall denote projection to α, that is, the direct image of T along restriction, T-<sup>α</sup>:= { f<sup>α</sup> <sup>|</sup> <sup>f</sup> <sup>∈</sup> <sup>T</sup> }; if <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>α</sup>, then <sup>i</sup>β(T) shall denote cylindrification to <sup>β</sup>, that is, the inverse image of restriction, <sup>i</sup>β(T) := { <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>β</sup> <sup>|</sup> <sup>f</sup><sup>α</sup> ∈ T }. Recall that i<sup>β</sup> is right adjoint to -<sup>α</sup>. With this in mind, the natural join and the inner union of relations are respectively described by the following formulas:

$$\begin{aligned} \left(\alpha\_1, T\_1\right) \wedge \left(\alpha\_2, T\_2\right) &:= \left(\alpha\_1 \cup \alpha\_2, T\right) \\ \text{where } T &= \left\{ f \mid f \restriction\_{\alpha\_i} \in T\_i, i = 1, 2 \right\} \\ &= i\_{\alpha\_1 \cup \alpha\_2} \left(T\_1\right) \cap i\_{\alpha\_1 \cup \alpha\_2} \left(T\_2\right), \\ \left(\alpha\_1, T\_1\right) \vee \left(\alpha\_2, T\_2\right) &:= \left(\alpha\_1 \cap \alpha\_2, T\right) \\ \text{where } T &= \left\{ f \mid \exists i \in \{1, 2\}, \exists g \in T\_i \text{ s.t. } g \restriction\_{\alpha\_1 \cap \alpha\_2} = f \right\} \\ &= T\_1 \mathbb{I}\_{\alpha\_1 \cap \alpha\_2} \cup T\_2 \mathbb{I}\_{\alpha\_1 \cap \alpha\_2} \text{ .} \end{aligned}$$

The order is then given by (α1, T1) ≤ (α2, T2) iff α<sup>2</sup> ⊆ α<sup>1</sup> and T1-<sup>α</sup>2⊆ T2.

A convenient way of describing these lattices was introduced in [17, Lemma 2.1]. The authors showed that the relational lattices R(D, A) are isomorphic to the lattices of closed subsets of <sup>A</sup> <sup>∪</sup> <sup>D</sup><sup>A</sup>, where <sup>Z</sup> <sup>⊆</sup> <sup>A</sup> <sup>∪</sup> <sup>D</sup><sup>A</sup> is said to be closed if it is a fixed-point of the closure operator ( − ) defined as

<sup>Z</sup> := <sup>Z</sup> ∪ { <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> <sup>|</sup> <sup>A</sup> \ <sup>Z</sup> <sup>⊆</sup> Eq(f,g), for some <sup>g</sup> <sup>∈</sup> <sup>Z</sup> } ,

<sup>1</sup> In [17] such a lattice is called *full* relational lattice. The wording "class of relational lattices" is used there for the class of lattices that have an embedding into some lattice of the form R(*D, A*).

where in the formula above Eq(f,g) is the equalizer of f and g. Letting δ(f,g) := { x ∈ A | f(x) = g(x) }, the above definition of the closure operator is obviously equivalent to the following one:

$$\overline{Z} := \alpha \cup \{ f \in D^A \mid \delta(f, g) \subseteq \alpha, \text{ for some } g \in Z \cap D^A \}, \text{ with } \alpha = Z \cap A.$$

From now on, we rely on this representation of relational lattices.

### **4 Lattices from Metric Spaces**

Generalized ultrametric spaces over a Boolean algebra P(A) turn out to be a convenient tool for studying relational lattices [17,22]. Metrics are well known tools from graph theory, see e.g. [10]. Generalized ultrametric spaces over a Boolean algebra P(A) were introduced in [20] to study equivalence relations.

**Definition 1.** *An* ultrametric space over P(A) *(briefly, a* space*) is a pair* (X, δ)*, with* δ : X × X −→ P(A) *such that, for every* f, g, h ∈ X*,*

$$
\delta(f,f) \subseteq \emptyset, \qquad \qquad \qquad \delta(f,g) \subseteq \delta(f,h) \cup \delta(h,g) \,, \tag{2}
$$

$$
\delta(f, g) = \emptyset \text{ } implies \text{ } f = g \text{ }, \tag{3}
$$

$$
\delta(f, g) = \delta(g, f) \text{ }. \tag{4}
$$

That is, we have defined an ultrametric space over P(A) as a category (with a small set of objects) enriched over (P(A)op, <sup>∅</sup>,∪) (equation (2), see [16]) which moreover is *reduced* and *symmetric* (conditions (3)) .

<sup>A</sup> *morphism* of spaces<sup>2</sup> <sup>ψ</sup> : (X, δX) −→ (Y,δ<sup>Y</sup> ) is a function <sup>ψ</sup> : <sup>X</sup> −→ <sup>Y</sup> such that δ<sup>Y</sup> (ψ(f), ψ(g)) ≤ δX(f,g), for each f,g ∈ X. Obviously, spaces and their morphisms form a category. If δ<sup>Y</sup> (ψ(f), ψ(g)) = δX(f,g), for each f,g ∈ X, then ψ is said to be an *isometry*. A space (X, δ) is said to be *pairwise-complete*, see [2], or *convex*, see [19], if, for each f,g ∈ X and α, β ⊆ A,

$$
\delta(f, g) \subseteq \alpha \cup \beta \text{ implies } \delta(f, h) \subseteq \alpha \text{ and } \delta(h, g) \subseteq \beta, \text{ for some } h \in X.
$$

**Proposition 2 (see** [2,20]**).** *If* A *is finite, then a space is injective in the category of spaces if and only if it is pairwise-complete.*

If (X, δX) is a space and Y ⊆ X, then the restriction of δ<sup>X</sup> to Y induces a space (Y,δX); we say then that (Y,δX) is a *subspace* of X. Notice that the inclusion of Y into X yields an isometry of spaces.

Our main example of space over P(A) is (D<sup>A</sup>, δ), with D<sup>A</sup> the set of functions from A to D and the distance defined by

$$\delta(f, g) := \left\{ a \in A \mid f(a) \neq g(a) \right\}.\tag{4}$$

A second example is a slight generalization of the previous one. Given a surjective function <sup>π</sup> : <sup>D</sup> −→ <sup>A</sup>, let Sec<sup>π</sup> denote the set of all the functions <sup>f</sup> : <sup>A</sup> −→ <sup>D</sup> such

<sup>2</sup> As *P*(*A*) is not totally ordered, we avoid calling a morphism "*non-expanding map*" as it is often done in the literature.

that <sup>π</sup> ◦ <sup>f</sup> <sup>=</sup> idA. Then Sec<sup>π</sup> <sup>⊆</sup> <sup>D</sup>A, so Sec<sup>π</sup> with the distance inherited from (DA, δ) can be made into a space. Considering the first projection <sup>π</sup><sup>1</sup> : <sup>A</sup><sup>×</sup> <sup>D</sup> −→ <sup>A</sup>, we see that (DA, δ) is isomorphic to the space Secπ<sup>1</sup> . By identifying <sup>f</sup> <sup>∈</sup> Sec<sup>π</sup> with a vector f(a) <sup>∈</sup> <sup>π</sup>−<sup>1</sup>(a) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>A</sup>, we see that

$$\mathbf{Sec}\_{\pi} = \prod\_{a \in A} D\_a \,, \quad \text{where } D\_a := \pi^{-1}(a). \tag{5}$$

That is, the spaces of the form Sec<sup>π</sup> are naturally related to Hamming graphs in combinatorics [13], dependent function types in type theory [6,14], universal **S5**<sup>A</sup>-product frames in modal logic [12].

**Theorem 3 (see** [23]**).** *Spaces of the form Sec*<sup>π</sup> *are, up to isomorphism, exactly the injective objects in the category of spaces.*

#### **4.1 The Lattice of a Space**

The construction of the lattice R(D, A) can be carried out from any space. Namely, for a space (X, δ) over P(A), say that Z ⊆ X is α*-closed* if g ∈ Z and δ(f,g) ⊆ α implies f ∈ Z. Clearly, α-closed subsets of X form a Moore family so, for <sup>Z</sup> <sup>⊆</sup> <sup>X</sup>, we denote by <sup>Z</sup><sup>α</sup> the least <sup>α</sup>-closed subset of <sup>X</sup> containing <sup>Z</sup>. Observe that <sup>f</sup> <sup>∈</sup> <sup>Z</sup><sup>α</sup> if and only if <sup>δ</sup>(f,g) <sup>⊆</sup> <sup>α</sup> for some <sup>g</sup> <sup>∈</sup> <sup>Z</sup>. Next and in the rest of the paper, we shall exploit the obvious isomorphism between P(A)×P(X) and P(A ∪ X) (where we suppose A and X disjoint) and notationally identify a pair (α, Z) ∈ P(A) × P(X) with its image α ∪ X ∈ P(A ∪ X). Let us say then that (α, Z) is closed if Z is α-closed. Closed subsets of P(A ∪ X) form a Moore family, whence a complete lattice where the order is subset inclusion.

**Definition 4.** *For a space* (X, δ)*, the lattice* L(X, δ) *is the lattice of closed subsets of* P(A ∪ X)*.*

Clearly, for the space (D<sup>A</sup>, δ), we have L(D<sup>A</sup>, δ) = R(D, A). Let us mention that meets and joins L(X, δ) are computed using the formulas in (1). In particular, for joins,

$$(\alpha, Y) \vee (\beta, Z) = (\alpha \cup \beta, \overline{Y \cup Z}^{\alpha \cup \beta})\dots$$

The above formula yields that, for any f ∈ X, f ∈ (α, Y ) ∨ (β,Z) if and only if δ(f,g) ⊆ α ∪ β, for some g ∈ Y ∪ Z.

We argue next that the above construction is functorial. Below, for a function <sup>ψ</sup> : <sup>X</sup> −→ <sup>Y</sup> , <sup>ψ</sup>−<sup>1</sup> : <sup>P</sup>(<sup>Y</sup> ) −→ <sup>P</sup>(X) is the inverse image of <sup>ψ</sup>, defined by <sup>ψ</sup>−<sup>1</sup>(Z) := { x ∈ X | ψ(x) ∈ Z }.

**Proposition 5.** *If* <sup>ψ</sup> : (X, δX) −→ (Y,δ<sup>Y</sup> ) *is a space morphism and* (α, Z) <sup>∈</sup> <sup>L</sup>(Y,δ<sup>Y</sup> )*, then* (α, ψ−<sup>1</sup>(Z)) <sup>∈</sup> <sup>L</sup>(X, δX)*. Therefore, by defining* <sup>L</sup>(ψ)(α, Z) := (α, ψ−<sup>1</sup>(Z))*, the construction* L *lifts to a contravariant functor from the category of spaces to the category of complete meet-semilattices.*

*Proof.* Let <sup>f</sup> <sup>∈</sup> <sup>X</sup> be such that, for some <sup>g</sup> <sup>∈</sup> <sup>ψ</sup>−<sup>1</sup>(Z) (i.e. <sup>ψ</sup>(g) <sup>∈</sup> <sup>Z</sup>), we have δX(f,g) ⊆ α. Then δ<sup>Y</sup> (ψ(f), ψ(g)) ⊆ δX(f,g) ⊆ α, so ψ(f) ∈ Z, since <sup>Z</sup> is <sup>α</sup>-closed, and <sup>f</sup> <sup>∈</sup> <sup>ψ</sup>−<sup>1</sup>(Z). In order to see that <sup>L</sup>(ψ) preserves arbitrary intersections, recall that <sup>ψ</sup>−<sup>1</sup> does.

Notice that L(ψ) might not preserve arbitrary joins.

**Proposition 6.** *The lattices* L(*Sec*π) *generate the same lattice variety of the lattices* R(D, A)*.*

That is, a lattice equation holds in all the lattices L(Secπ) if and only if it holds in all the relation lattices R(D, A).

*Proof.* Clearly, each lattice R(D, A) is of the form L(Secπ). Thus we only need to argue that every lattice of the form L(Secπ) belongs to the lattice variety generated by the R(D, A), that is, the least class of lattices containing the lattices R(D, A) and closed under products, sublattices, and homomorphic images. We argue as follows.

As every space Sec<sup>π</sup> embeds into a space (D<sup>A</sup>, δ) and a space Sec<sup>π</sup> is injective, we have maps <sup>ι</sup> : Sec<sup>π</sup> −→ (D<sup>A</sup>, δ) and <sup>ψ</sup> : (D<sup>A</sup>, δ) −→ Sec<sup>π</sup> such that <sup>ψ</sup> ◦ <sup>ι</sup> <sup>=</sup> idSec<sup>π</sup> . By functoriality, <sup>L</sup>(ι) ◦ <sup>L</sup>(ψ) = idL(Secπ). Since <sup>L</sup>(ι) preserves all meets, it has a left adjoint : <sup>L</sup>(Secπ) −→ <sup>L</sup>(D<sup>A</sup>, δ) = <sup>R</sup>(D, A). It is easy to see that ( , L(ψ)) is an EA-duet in the sense of [24, Definition 9.1] and therefore L(Secπ) is a homomorphic image of a sublattice of <sup>R</sup>(D, A), by [24, Lemma 9.7].

*Remark 7.* For the statement of [24, Lemma 9.7] to hold, additional conditions are necessary on the domain and the codomain of an EA-duet. Yet the implication that derives being a homomorphic image of a sublattice from the existence of an EA-duet is still valid under the hypothesis that the two arrows of the EA-duet preserve one all joins and, the other, all meets.

#### **4.2 Extension from a Boolean Subalgebra**

We suppose that P(B) is a Boolean subalgebra of P(A) via an inclusion i : P(B) −→ P(A). If (X, δB) is a space over P(B), then we can transform it into a space (X, δA) over P(A) by setting δA(f,g) = i(δB(f,g)). We have therefore two lattices L(X, δB) and L(X, δA).

**Proposition 8.** *Let* <sup>β</sup> <sup>⊆</sup> <sup>B</sup> *and* <sup>Y</sup> <sup>⊆</sup> <sup>X</sup>*. Then* <sup>Y</sup> *is* <sup>β</sup>*-closed if and only if it is* <sup>i</sup>(β)*-closed. Consequently the map* <sup>i</sup>∗*, sending* (β,Y ) <sup>∈</sup> <sup>L</sup>(X, δB) *to* <sup>i</sup>∗(β,Y ) := (i(β), Y ) <sup>∈</sup> <sup>L</sup>(X, δA)*, is a lattice embedding.*

*Proof.* Observe that δB(f,g) ⊆ β if and only if δA(f,g) = i(δB(f,g)) ⊆ i(β). This immediately implies the first statement of the Lemma, but also that, for <sup>Y</sup> <sup>⊆</sup> <sup>X</sup>, <sup>Y</sup> <sup>β</sup> <sup>=</sup> <sup>Y</sup> <sup>i</sup>(β) . Using the fact that meets are computed as intersections and that i preserves intersections, it is easily seen that i<sup>∗</sup> preserves meets. For joins let us compute as follows:

$$\begin{split} i\_\*(\beta\_1, Y\_1) \vee i\_\*(\beta\_2, Y\_2) &= (i(\beta\_1) \cup i(\beta\_2), \overline{Y\_1 \cup Y\_2}^{i(\beta\_1) \cup i(\beta\_2)}) \\ &= (i(\beta\_1 \cup \beta\_2), \overline{Y\_1 \cup Y\_2}^{i(\beta\_1 \cup \beta\_2)}) = (i(\beta\_1 \cup \beta\_2), \overline{Y\_1 \cup Y\_2}^{\beta\_1 \cup \beta\_2}) \\ &= i\_\*(\beta\_1 \cup \beta\_2, \overline{Y\_1 \cup Y\_2}^{\beta\_1 \cup \beta\_2}) = i\_\*((\beta\_1, Y\_1) \vee (\beta\_2, Y\_2)). \end{split}$$

### **5 Failures from Big to Small Lattices**

The set of lattice terms is generated by the following grammar:

$$t := x \mid \top \mid t \land t \mid \bot \mid t \lor t, \bot$$

where x belongs to a set of variables X. For lattice terms t1,...,tn, we use V ars(t1,...,tn) to denote the set of variables (which is finite) occurring in any of these terms. The size of a term t is the number of nodes in the representation of <sup>t</sup> as a tree. If <sup>v</sup> : <sup>X</sup> −→ <sup>L</sup> is a valuation of variables into a lattice <sup>L</sup>, the value of a term t w.r.t. the valuation v is defined by induction in the obvious way; here we shall use t<sup>v</sup> for it.

For t, s two lattice terms, the inclusion t ≤ s is the equation t ∨ s = s. Any lattice-theoretic equation is equivalent to a pair of inclusions, so the problem of deciding the equational theory of a class of lattices reduces to the problem of decing inclusions. An inclusion t ≤ s is valid in a class of lattices K if, for any valuation <sup>v</sup> : <sup>X</sup> −→ <sup>L</sup> with <sup>L</sup> ∈ K, v<sup>v</sup> ≤ sv; it fails in K if for some L ∈ K and <sup>v</sup> : <sup>X</sup> −→ <sup>L</sup> we have t<sup>v</sup> ≤ sv.

From now on, our goal shall be proving that if an inclusion t ≤ s fails in a lattice R(D, A), then it fails in a lattice L(Secπ), where Sec<sup>π</sup> is a finite space over some finite Boolean algebra P(B). The size of B and of the space Secπ, shall be inferred from of the sizes of t and s.

From now on, we us fix terms t and s, a lattice R(D, A), and a valuation <sup>v</sup> : <sup>X</sup> −→ <sup>R</sup>(D, A) such that t<sup>v</sup> ⊆ sv.

**Lemma 9.** *If, for some* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*,* <sup>a</sup> <sup>∈</sup> t<sup>v</sup> \ sv*, then the inclusion* t ≤ s *fails in the lattice* <sup>R</sup>(E,B) *with* <sup>B</sup> <sup>=</sup> <sup>∅</sup> *and* <sup>E</sup> *a singleton.*

*Proof.* The map sending (α, X) <sup>∈</sup> <sup>R</sup>(D, A) to <sup>α</sup> <sup>∈</sup> <sup>P</sup>(A) is lattice morphism. Therefore if t ≤ s fails because of a ∈ A, then it already fails in the Boolean lattice P(A). Since P(A) is distributive, t ≤ s fails in the two elements lattice. Now, when <sup>B</sup> <sup>=</sup> <sup>∅</sup> and <sup>E</sup> is a singleton <sup>R</sup>(E,B) is (isomorphic to) the 2 elements lattice, so the same equation fails in <sup>R</sup>(E,B).

Because of the Lemma, we shall focus on functions <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> such that <sup>f</sup> <sup>∈</sup> t<sup>v</sup> \ sv. In this case we shall say that f *witnesses the failure of* t ≤ s (in R(D, A), w.r.t. the valuation v).

### **5.1 The Lattices R(***D, A***)***<sup>T</sup>*

Let T be a subset of D<sup>A</sup> and consider the subspace (T,δ) of D<sup>A</sup> induced by the inclusion <sup>i</sup><sup>T</sup> : <sup>T</sup> <sup>⊆</sup> <sup>D</sup>A. According to Proposition 5, the inclusion <sup>i</sup><sup>T</sup> induces a complete meet-semilattice homomorphism <sup>L</sup>(i<sup>T</sup> ) : <sup>R</sup>(D, A) = <sup>L</sup>(DA, δ) −→ <sup>L</sup>(T,δ). Such a map has a right adjoint <sup>j</sup><sup>T</sup> : <sup>L</sup>(T,δ) −→ <sup>L</sup>(DA, δ), which is a complete join-semilattice homomorphism; moreover j<sup>T</sup> is injective, since L(i<sup>T</sup> ) is surjective.

**Proposition 10.** *For a subset* <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> *and* (α, X) <sup>∈</sup> <sup>R</sup>(D, A)*,* (α, <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> ) = <sup>j</sup><sup>T</sup> (L(i<sup>T</sup> (α, X))*. The set of elements of the form* (α, <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> )*, for* α ⊆ A *and* <sup>X</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup>*, is a complete sub-join-semilattice of* <sup>R</sup>(D, A)*.*

*Proof.* It is easily seen that <sup>L</sup>(i<sup>T</sup> )(α, X)=(α, X <sup>∩</sup> <sup>T</sup>) and that, for (β,Y ) <sup>∈</sup> <sup>L</sup>(T,δ), (β,Y ) <sup>⊆</sup> (α, X∩T) if and only if (β, <sup>Y</sup> <sup>β</sup> ) <sup>⊆</sup> (α, X), so <sup>j</sup><sup>T</sup> (β,Y )=(β, <sup>Y</sup> <sup>β</sup> ).

It follows that the elements of the form (α, <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> ), where (α, X) <sup>∈</sup> <sup>R</sup>(D, A), form a sub-complete join-semilattice of R(D, A): indeed, they are the image of lattice L(T,δ) under the complete join-semilattice homomorphism j<sup>T</sup> . We argue next that, for any pair (α, X) (we do not require that X is α-closed) there is a <sup>Z</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> which is <sup>α</sup>-closed and such that <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> <sup>=</sup> <sup>Z</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> . Indeed, the equality

$$\overline{X \cap T}^{\alpha} = \overline{\overline{X \cap T}^{\alpha} \cap T}^{\alpha}$$

is easily verified, so we can let <sup>Z</sup> <sup>=</sup> <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup>

Therefore, the set of pairs of the form (α, <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> ) is a dual Moore family and a complete lattice, where joins are computed as in R(D, A), and where meets are computed in a way that we shall make explicit. For the moment, let us fix the notation.

**Definition 11.** <sup>R</sup>(D, A)<sup>T</sup> *is the lattice of elements of the form* (α, <sup>X</sup> <sup>∩</sup> <sup>T</sup> <sup>α</sup> )*.*

By the proof of Proposition 10, the lattice R(D, A)<sup>T</sup> is isomorphic to the latttice L(T,δ). We shall use the symbol for meets in R(D, A)<sup>T</sup> ; these are computed by the formula

$$\bigwedge\_{i \in I} (\alpha\_i, X\_i) = (\bigcap\_{i \in I} \alpha\_i, \bigcap\_{i \in I} X\_i)^{\diamond},$$

where, for each (α, X) <sup>∈</sup> <sup>R</sup>(D, A), (α, X) ◦ is the greatest pair in R(D, A)<sup>T</sup> that is below (α, X). Standard theory on adjoints yields

$$(\alpha, X)^\circ = (j\_T \circ \mathsf{L}(i\_T))(\alpha, X) = (\alpha, \overline{X \cap T}^\alpha)\dots$$

We obtain in this way the explicit formula for the binary meet in R(D, A)<sup>T</sup> :

$$(\alpha, \overline{X \cap T}^{\alpha}) \land (\beta, \overline{Y \cap T}^{\beta}) = (\alpha \cap \beta, \overline{\overline{X \cap T}^{\alpha} \cap \overline{Y \cap T}^{\beta} \cap T}^{\alpha \cap \beta}) \dots$$

Remark that we have

$$(\alpha, X) \bowtie (\beta, Y) \subseteq (\alpha, X) \cap (\beta, Y)$$

whenever (α, X) and (β,Y ) are in R(D, A)<sup>T</sup> .

**Lemma 12.** *Let* (α, X),(β,Y ) <sup>∈</sup> <sup>R</sup>(D, A)<sup>T</sup> *and let* <sup>f</sup> <sup>∈</sup> <sup>T</sup>*. If* <sup>f</sup> <sup>∈</sup> (α, X)∩(β,Y )*, then* f ∈ (α, X)∧∧(β,Y )*.*

*Proof.* This is immediate from the fact that

$$
\overline{X \cap T}^{\alpha} \cap \overline{Y \cap T}^{\beta} \cap T \subseteq \overline{\overline{X \cap T}^{\alpha} \cap \overline{Y \cap T}^{\beta} \cap T}^{\alpha \cap \beta}.\tag{7}
$$

### **5.2 Preservation of the Failure in the Lattices R(***D, A***)***<sup>T</sup>*

Recall that <sup>v</sup> : <sup>X</sup> −→ <sup>R</sup>(D, A) is the valuation that we have fixed.

**Definition 13.** *For a susbset* <sup>T</sup> *of* <sup>D</sup><sup>A</sup>*, the valuation* <sup>v</sup><sup>T</sup> : <sup>X</sup> −→ <sup>R</sup>(D, A)<sup>T</sup> *is defined by the formula* v<sup>T</sup> (x) = v(x) ◦ *, for each* <sup>x</sup> <sup>∈</sup> <sup>X</sup>*.*

More explicitley, we have

$$v\_T(x) := \left(\alpha, \overline{T \cap X}^{\alpha}\right), \quad \text{where } (\alpha, X) = v(x) \ .$$

The valuation v<sup>T</sup> takes values in R(D, A)<sup>T</sup> , while v takes value in R(D, A). It is possible then to evaluate a lattice term t in R(D, A)<sup>T</sup> using v<sup>T</sup> and to evaluate it in R(D, A) using v. To improve readability, we shall use the notation t<sup>T</sup> for the result of evaluating the term in R(D, A)<sup>T</sup> , and the notation t for the result of evaluating it in R(D, A). Since both t and t<sup>T</sup> are subsets of P(A ∪ X), it is possible to compare them using inclusion.

**Lemma 14.** *The relation* s<sup>T</sup> ⊆ <sup>s</sup> *holds, for each* <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> *and each lattice term* s*.*

*Proof.* The proof of the Lemma is a straightforward induction, considering that <sup>v</sup><sup>T</sup> (x) <sup>⊆</sup> <sup>v</sup>(x) for all <sup>x</sup> <sup>∈</sup> <sup>X</sup>. For example, using si<sup>T</sup> ⊆ si, for i = 1, 2,

$$\mathbb{T}\left[s\_1 \wedge s\_2\right]\_T = \left[s\_1\right]\_T \mathbb{N}\left[s\_2\right]\_T \subseteq \left[s\_1\right]\_T \cap \left[s\_2\right]\_T \subseteq \left[s\_1\right] \cap \left[s\_2\right] = \left[s\_1 \wedge s\_2\right].\tag{7}$$

A straightforward induction also yields:

**Lemma 15.** *Let* <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> *be a finite subset, let* <sup>t</sup> *be a lattice term and suppose that* t = (β,Y )*. Then* t<sup>T</sup> *is of the form* (β,Y ) *for some* <sup>Y</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup>*.*

**Definition 16.** *Let us define, for each term* <sup>t</sup> *and* <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> *such that* <sup>f</sup> <sup>∈</sup> t*, a finite set* <sup>T</sup>(f, t) <sup>⊆</sup> <sup>D</sup><sup>A</sup> *as follows:*


Obviously, we have:

**Lemma 17.** *For each lattice term* <sup>t</sup> *and* <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> *such that* <sup>f</sup> <sup>∈</sup> t*,* f ∈ T(f, t)*.*

**Proposition 18.** *For each lattice term* <sup>t</sup> *and* <sup>f</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> *such that* <sup>f</sup> <sup>∈</sup> t*, if* T(f, t) ⊆ T*, then* f ∈ t<sup>T</sup> *.*

*Proof.* We prove the statement by induction on t.


$$f \in [s\_1]\_T \bowtie [s\_2]\_T = [s\_1 \wedge s\_2]\_T \cdot \square$$

– Suppose t = s<sup>1</sup> ∨ s<sup>2</sup> and f ∈ s<sup>1</sup> ∨ s2; let also (βi, Yi) := si for i = 1, 2. We have defined T(f, t) := { f } ∪ T(g, si) for some i ∈ { 1, 2 } and for some g ∈ si such that δ(f,g) ⊆ β<sup>1</sup> ∪ β2. Now g ∈ T(g, si) ⊆ T(f, t) ⊆ T so, by the induction hypothesis, g ∈ si<sup>T</sup> . According to Lemma 15, for each i = 1, 2 si<sup>T</sup> is of the form (βi, Y <sup>i</sup> ), for some subset Y <sup>i</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup>. Therefore δ(f,g) ⊆ β<sup>1</sup> ∪ β<sup>2</sup> and g ∈ si<sup>T</sup> implies

$$f \in \lbrack s\_1 \rbrack\_T \vee \lbrack s\_2 \rbrack\_T = \lbrack s\_1 \vee s\_2 \rbrack\_T. \tag{7}$$

**Proposition 19.** *Suppose* <sup>f</sup> *witnesses the failure of the inclusion* <sup>t</sup> <sup>≤</sup> <sup>s</sup> *in* <sup>R</sup>(D, A) *w.r.t. the valuation* <sup>v</sup>*. Then, for each subset* <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> *such* <sup>T</sup>(f, t) <sup>⊆</sup> <sup>T</sup>*,* <sup>f</sup> *witnesses the failure of the inclusion* <sup>t</sup> <sup>≤</sup> <sup>s</sup> *in the lattice* <sup>R</sup>(D, A)<sup>T</sup> *and w.r.t. valuation* v<sup>T</sup> *.*

*Proof.* As <sup>f</sup> witnesses <sup>t</sup> ≤ <sup>s</sup> in <sup>R</sup>(D, A), <sup>f</sup> <sup>∈</sup> t and f ∈ s. By Lemma 18 f ∈ t<sup>T</sup> . If f ∈ s<sup>T</sup> , then s<sup>T</sup> ⊆ s (Lemma 14) implies f ∈ s, a contradicition. Therefore f ∈ <sup>s</sup><sup>T</sup> , so <sup>f</sup> witnesses <sup>t</sup> ≤ <sup>s</sup> in <sup>R</sup>(D, A)<sup>T</sup> .

#### **5.3 Preservation of the Failure in a Finite Lattice L(***X, δ***)**

From now on, we suppose that <sup>T</sup> <sup>⊆</sup> <sup>D</sup><sup>A</sup> is finite and <sup>T</sup>(f, t) <sup>⊆</sup> <sup>T</sup> with <sup>f</sup> witnessing the failure of t ≤ s. Consider the sub-Boolean-algebra of P(A) generated by the sets

$$\left\{ \delta(f, g) \mid f, g \in T \right\} \cup \left\{ A \cap v(x) \mid x \in Vars(t, s) \right\}.\tag{6}$$

Let us call B this Boolean algebra (yet, notice the dependency of this definition on T, as well as on t, s and v). It is well known that a Boolean algebra generated by a finite set is finite.

*Remark 20.* If n = card(T) and m = card(V ars(t, s)), then B can have at most 2 n(n−1) <sup>2</sup> <sup>+</sup><sup>m</sup> atoms. If we let k be the maximum of the sizes of t and s, then, for T = T(f, t), both n ≤ k and m ≤ 2k. We obtain in this case the overapproximation 2 <sup>k</sup>2+3<sup>k</sup> <sup>2</sup> on the number of atoms of B.

Let us also recall that B is isomorphic to the powerset P(at(B)), where at(B) is the set of atoms of <sup>B</sup>. Let <sup>i</sup> : <sup>P</sup>(at(B)) −→ <sup>P</sup>(A) be an injectve homomorphism of Boolean algebras whose image is <sup>B</sup>. Since <sup>δ</sup>(f,g) <sup>∈</sup> <sup>B</sup> for every f,g <sup>∈</sup> <sup>T</sup>, we can transform the metric space (T,δ) induced from (D<sup>A</sup>, δ) into a metric space (T,δat(B)) whose distance takes values in the powerset algebra P(at(B)):

δat(B)(f,g) = β if and only if δ(f,g) = i(β).

Recall from Proposition <sup>8</sup> that there is a lattice embedding <sup>i</sup><sup>∗</sup> : <sup>L</sup>(T,δat(B)) −→ <sup>L</sup>(T,δ), defined in the obvious way: <sup>i</sup>∗(α, Y )=(i(β), Y ).

**Proposition 21.** *If* <sup>f</sup> *witnesses the failure of the inclusion* <sup>t</sup> <sup>≤</sup> <sup>s</sup> *in* <sup>R</sup>(D, A) *w.r.t. the valuation* v*, then the same inclusion fails in all the lattices* L(T,δat(B))*, where* T *is a finite set and* T(f, t) ⊆ T*.*

*Proof.* By Proposition <sup>19</sup> the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in the lattice <sup>R</sup>(D, A)<sup>T</sup> . This lattice is isomorphic to the lattice <sup>L</sup>(T,δ) via the map sending (α, X) <sup>∈</sup> <sup>R</sup>(D, A)<sup>T</sup> to (α, X ∩ T). Up to this isomorphism, it is seen that the (restriction to the variables in t and s of) the valuation v<sup>T</sup> takes values in the image of the lattice <sup>L</sup>(T,δat(B)) via <sup>i</sup>∗, so t<sup>T</sup> , s<sup>T</sup> belong to this sublattice and the inclusion fails in this lattice, and therefore also in <sup>L</sup>(T,δat(B)).

### **6 Preservation of the Failure in a Finite Lattice L(**Sec*π***)**

We have seen up to now that if <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in <sup>R</sup>(D, A), then it fails in many lattices of the form L(T,δat(B)). Yet it is not obvious a priori that any of these lattices belongs to the variety generated by the relational lattices. We show in this section that we can extend any T to a finite set G while keeping B fixed, so that (G, δat(B)) is a pairwise-complete space over P(at(B)). Thus, the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in the finite lattice <sup>L</sup>(G, δat(B)). Since (G, δat(B)) is isomorphic to a space of the form Sec<sup>π</sup> with <sup>π</sup> : <sup>E</sup> −→ at(B), the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in a lattice L(Secπ) which we have seen belongs to the variety generated by the relational lattices. This also leads to construct a finite relational lattice R(at(B), E) in which the equation t ≤ s fails. By following the chain of constructions, the sizes of at(B) and E can also be estimated, leading to decidability of the equational theory of relational lattices.

**Definition 22.** *<sup>A</sup>* glue of <sup>T</sup> and <sup>B</sup> *is a function* <sup>g</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> *such that, for all* <sup>α</sup> <sup>∈</sup> at(B)*, there exists* <sup>f</sup> <sup>∈</sup> <sup>T</sup> *with* <sup>f</sup>α = g*. We denote by* G *the set of all functions that are glues of* T *and* B*.*

Observe that <sup>T</sup> <sup>⊆</sup> <sup>G</sup> and that <sup>G</sup> is finite, with

$$\text{card}(\mathbf{G}) \le \text{card}(T)^{\text{card}(\text{at}(\mathbf{B}))}.\tag{7}$$

In order to prove the following Lemma, let, for each <sup>α</sup> <sup>∈</sup> at(B) and <sup>g</sup> <sup>∈</sup> <sup>G</sup>, f(g, α) ∈ T be such that gα = f(g, α)α.

**Lemma 23.** *If* <sup>g</sup>1, g<sup>2</sup> <sup>∈</sup> <sup>G</sup>*, then* <sup>δ</sup>(g1, g2) <sup>∈</sup> <sup>B</sup>*.*

*Proof.*

$$\delta(g\_1, g\_2) = \bigcup\_{\alpha \in \mathfrak{At}(\mathsf{B})} (\alpha \cap \delta(g\_1, g\_2)) = \bigcup\_{\alpha \in \mathfrak{At}(\mathsf{B})} (\alpha \cap \delta(f(g\_1, \alpha), f(g\_2, \alpha))) \dots$$

Since <sup>δ</sup>(f(g1, α), f(g2, α)) <sup>∈</sup> <sup>B</sup> and <sup>α</sup> is an atom of <sup>B</sup>, each expression of the form <sup>α</sup> <sup>∩</sup> <sup>δ</sup>(f(g1, α), f(g2, α)) is either <sup>∅</sup> or <sup>α</sup>. It follows that <sup>δ</sup>(g1, g2) <sup>∈</sup> <sup>B</sup>.

For a Boolean subalgebra B of P(A), we say that a subset T of D<sup>A</sup> is *pairwisecomplete relative to* B if, for each f,g ∈ T,

1. δ(f,g) ∈ B, 2. δ(f,g) ⊆ β ∪ γ, implies δ(f,h) ⊆ β and δ(h, g) ⊆ γ for some h ∈ T, for each β, γ ∈ B.

**Lemma 24.** *The set* G *is pairwise-complete relative to the Boolean algebra* B*.*

*Proof.* Let f,g <sup>∈</sup> <sup>G</sup> be such that <sup>δ</sup>(f,g) <sup>⊆</sup> <sup>β</sup> <sup>∪</sup> <sup>γ</sup>. Let <sup>h</sup> <sup>∈</sup> <sup>D</sup><sup>A</sup> be defined so that, for each <sup>α</sup> <sup>∈</sup> at(B), <sup>h</sup>α = fα if α ⊆ β and hα = gα, otherwise. Obviously, <sup>h</sup> <sup>∈</sup> <sup>G</sup>.

Observe that <sup>α</sup> ⊆ <sup>β</sup> if and only if <sup>α</sup> <sup>⊆</sup> <sup>β</sup><sup>c</sup>, for each <sup>α</sup> <sup>∈</sup> at(B), since <sup>β</sup> <sup>∈</sup> <sup>B</sup>. We deduce therefore hα = f<sup>α</sup> if <sup>α</sup> <sup>∈</sup> at(B) and <sup>α</sup> <sup>⊆</sup> <sup>β</sup><sup>c</sup>, so <sup>f</sup>(a) = <sup>h</sup>(a) for each <sup>a</sup> <sup>∈</sup> <sup>β</sup><sup>c</sup>. Consequently <sup>β</sup><sup>c</sup> <sup>⊆</sup> Eq(f,h) and <sup>δ</sup>(f,h) <sup>⊆</sup> <sup>β</sup>.

We also have hα = g<sup>α</sup> if <sup>α</sup> <sup>∈</sup> at(B) and <sup>α</sup> <sup>⊆</sup> <sup>γ</sup><sup>c</sup>. As before, this implies δ(h, g) ⊆ γ. Indeed, this is the case if α ⊆ β, by definition of h. Suppose now that <sup>α</sup> ⊆ <sup>β</sup>, so <sup>α</sup> <sup>⊆</sup> <sup>β</sup><sup>c</sup> <sup>∩</sup> <sup>γ</sup><sup>c</sup> = (<sup>β</sup> <sup>∪</sup> <sup>γ</sup>)<sup>c</sup>. Since <sup>δ</sup>(f,g) <sup>⊆</sup> <sup>β</sup> <sup>∪</sup> <sup>γ</sup>, then <sup>α</sup> <sup>⊆</sup> <sup>δ</sup>(f,g)<sup>c</sup> <sup>=</sup> Eq(f,g), i.e. fα = gα. Together with hα = fα (by definition of h) we obtain hα = fα.

We can finally bring together the observations developed so far and state our main results.

**Theorem 25.** *If an inclusion* <sup>t</sup> <sup>≤</sup> <sup>s</sup> *fails in all the lattices* <sup>R</sup>(D, A)*, then it fails in a finite lattice* R(E,A )*, where* card(A ) <sup>≤</sup> <sup>2</sup><sup>p</sup>(k) *with* <sup>k</sup> <sup>=</sup> max(size(t), size(s))*,* <sup>p</sup>(k) = <sup>2</sup>k<sup>2</sup> +3k <sup>2</sup> *, and* card(E) ≤ size(t)*.*

*Proof.* By Proposition <sup>19</sup> the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in all the lattices <sup>R</sup>(D, A)<sup>T</sup> where <sup>T</sup>(f, t) <sup>⊆</sup> <sup>T</sup>. Once defined <sup>B</sup> as the Boolean subalgebra of <sup>P</sup>(A) generated by the sets as in the display (6) (with T = T(f,T)) and G as the set of glues of <sup>T</sup>(f, t) and <sup>B</sup> as in Definition 22, the inclusion fails in <sup>R</sup>(D, A)G, since <sup>T</sup>(f,T) <sup>⊆</sup> G, and then in L(G, δat(B)) by Proposition 21. The condition that G is pairwisecomplete relative to B is equivalent to saying that the space (G, δat(B)) is pairwisecomplete. This space is therefore isomorphic to a space of the form Sec<sup>π</sup> for some surjective <sup>π</sup> : <sup>F</sup> −→ at(B), and <sup>t</sup> <sup>≤</sup> <sup>s</sup> fails in <sup>L</sup>(Secπ).

Equation (7) shows that, for each <sup>α</sup> <sup>∈</sup> at(B), <sup>F</sup><sup>α</sup> <sup>=</sup> <sup>π</sup>−<sup>1</sup>(α) has cardinality at most card(T(f, t)) and the size of t is an upper bound for card(T(f, t)). We can therefore embed the space Sec<sup>π</sup> into a space of the form (Eat(B) , δ) with the size of t an upper bound for card(E). The proof of Proposition 6 exhibits L(Secπ) as a homomorphic image of a sublattice of L(Eat(B) , δ) and therefore the inclusion <sup>t</sup> <sup>≤</sup> <sup>s</sup> also fails within <sup>L</sup>(Eat(B) , δ) = R(E, at(B)). The upper bound on the size of at(B) has been extimated in Remark 20.

*Remark 26.* In the statement of the previous Theorem, the size of the lattice R(E,A ) can be estimated out of the sizes of E and A considering that

$$P(E^{A'}) \subseteq \mathsf{R}(E, A') \subseteq P(A' \cup E^{A'})\.$$

An upper bound for card(R(E,A )) is therefore 2<sup>p</sup>(k)+k2p(k) where p(k) is the polynomial of degree 2 as in the statement of the Theorem and k is the maximum of size(t), size(s).

A standard argument yields now:

**Corollary 27.** *The equational theory of the relational lattices is decidable.*

### **7 Conclusions**

We argued that the equational theory of relational lattices is decidable. We achieved this goal by giving a finite (counter)model construction of bounded size.

Our result leaves open other questions that we might ask on relational lattices. We mentioned in the introduction the quest for a complete axiomatic base for this theory or, anyway, the need of a complete deductive system—so to develop automatic reasoning for the algebra of relational lattices. As part of future researches it is tempting to contribute achieving this goal using the mathematical insights contained in the decidability proof.

Our result also opens new research directions, in primis, the investigation of the complexity of deciding lattice-theoretic equations/inclusions on relational lattices. Of course, the obvious decision procedure arising from the finite model construction is not optimal; few algebraic considerations already suggest how the decision procedure can be improved.

Also, it would be desirable next to investigate decidability of equational theories in signatures extending of the pure lattice signature; many such extensions are proposed in [17]. It is not difficult to adapt the present decidability proof so to add to the signature the header constant.

A further interesting question is how this result translates back to the field of multidimensional modal logic [15]. We pointed out in [22] how the algebra of relational lattices can be encoded into multimodal framework; we conjecture that our decidability result yields the decidability of some positive fragments of well known undecidable logics, such as the products **S5**<sup>n</sup> with <sup>n</sup> <sup>≥</sup> 3. Moreover connections need to be established with other existing decidability results in modal logic and in database theory [1].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Graphs and Automata

# **Minimization of Graph Weighted Models over Circular Strings**

Guillaume Rabusseau(B)

Reasoning and Learning Lab, School of Computer Science, McGill University, Montreal, Canada guillaume.rabusseau@mcgill.ca

**Abstract.** Graph weighted models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings, trees and 2-dimensional words to arbitrary families of labeled graphs (and hypergraphs). In this paper, we propose polynomial time algorithms for minimizing and deciding the equivalence of GWMs defined over the family of circular strings on a finite alphabet (GWM<sup>c</sup> s). The study of GWM<sup>c</sup> s is particularly relevant since circular strings can be seen as the simplest family of graphs with cycles. Despite the simplicity of this family and of the corresponding computational model, the minimization problem is considerably more challenging than in the case of weighted automata over strings and trees: while linear algebra tools are overall sufficient to tackle the minimization problem for classical weighted automata (defined over a field), the minimization of GWM<sup>c</sup> s involves fundamental notions from the theory of finite dimensional algebra. We posit that the properties of GWM<sup>c</sup> s unraveled in this paper willprove useful for the study of GWMs defined over richer families of graphs.

### **1 Introduction**

Functions defined over syntactical structures such as strings, trees and graphs are ubiquitous in computer science. Automata models allow one to succinctly represent such functions. In particular, *weighted automata* can efficiently model functions mapping structured objects to values in a semi-ring. Weighted automata have been defined to handle functions whose domain are e.g. strings [9,26], trees [8,16] and 2-dimensional words [11]. More recently, Bailly et al. [2] proposed a computational model for functions mapping labeled graphs (or hypergraphs) to values in a field (see also [22, Chap. 2]): Graph Weighted Models (GWMs). GWMs extend the notion of *linear representation* of a function defined over strings and trees to functions defined over graphs labeled by symbols in a ranked alphabet: loosely speaking, while string weighted automata can be defined by associating each symbol in a finite alphabet to a linear map and tree weighted automata by associating each symbol in a ranked alphabet to a multilinear map, GWMs are defined by associating each arity k symbol from a ranked alphabet to a kth order tensor. The computation of a GWM boils down to mapping each vertex in a graph to the tensor associated to its label and performing contractions directed by the edges of the input graph to obtain a value in the supporting field. When restricted to the families of strings, trees or 2-dimensional words, GWMs are expressively equivalent to the classical notions of weighted automata over these structures.

Weighted automata have recently received interest from the machine learning community due to their ability to represent functions defined over structured objects. Efficient (and often consistent) learning algorithms have been developed for such computational models defined over sequences [3,6,10,19] and trees [1,4,14]. Motivated by the relevance of learning functions defined over richer families of labeled graphs, our long term objective is to design efficient learning algorithms for GWMs. This is however a challenging task. Given the close relationship between minimization and learning for classical weighted automata (see e.g. [7,21,27]), we take a first step in this direction by tackling the problem of minimizing GWMs defined over the simple family of *circular strings*.

Circular strings are strings whose last symbol is connected to the first. A circular string can be seen as a directed graph where each vertex is labeled by a symbol from a finite alphabet and is connected to his unique successor (i.e. a labeled graph composed of a unique cycle). Circular strings are relevant in biology (see e.g. [20] and references therein) and have been studied from a formal language perspective in the non-quantitative setting in [24]. The study of GWMs defined over such graphs is particularly relevant since circular strings are in some sense the simplest family of graphs with cycles (and cycles can be seen as the key obstacle for going from strings and trees to general graphs). Moreover, GWMs defined over the family of circular strings—which we henceforth denote by GWM<sup>c</sup>s to avoid confusions—take a simple form making them easily amenable to theoretical study: a GWM<sup>c</sup> is given by a set of matrices **A**<sup>σ</sup> for each symbol σ in a finite alphabet, and maps any circular string σ1σ<sup>2</sup> ··· σ<sup>k</sup> to the trace of the products of the matrices associated with the letters in the string<sup>1</sup>. Despite the simplicity of this computational model and its strong connection with string weighted automata, the minimization problem is considerably more challenging than in the case of string or tree weighted automata. More precisely, while the minimization problem can easily be handled using notions from linear algebra for e.g. real-valued string weighted automata (see e.g. [7]), we show in this paper that the minimization of GWM<sup>c</sup>s requires fundamental concepts from the theory of finite-dimensional algebras (such as the ones of radical and semi-simplicity).

**Contributions.** Throughout the paper, *we only consider automata defined over a field of characteristic* 0. After introducing notions on weighted automata, GWM<sup>c</sup>s and finite-dimensional algebras in Sect. 2, we first tackle the problem of deciding the equivalence of GWM<sup>c</sup>s in Sect. 3. The study of the equivalence problem is motivated by the simple observation that two minimal GWMs computing

<sup>1</sup> Note that this is a not a definition *per se* but rather a consequence of the definition of general GWMs (as introduced in [2,22]): when restricted to the family of circular strings, a GWM is given by a set of matrices and its computation can be succinctly expressed using the trace operator (whereas a general GWM is given by a set of *tensors* and its computation relies on *partial traces*).

the same function are not necessarily related by a change of basis, which is in contrast with a classical result stating that two minimal string weighted automata are equivalent if and only if they are related by a change of basis. Building from this observation, we unravel the fundamental notion of semi-simple GWM<sup>c</sup> and we show that *any function recognizable by a* GWM<sup>c</sup> *can be computed by a semi-simple* GWM<sup>c</sup> (Corollary 1) and that *two semi-simple* GWMcs *of equal dimensions computing the same function are necessarily related by a change of basis* (Corollary 2). These two results naturally give rise to *a polynomial time algorithm to decide whether two* GWM<sup>c</sup>s *are equivalent*. We then move on to the minimization problem in Sect. 4, where we give *a polynomial time minimization algorithm for* GWM<sup>c</sup>s which fundamentally relies on the notion of semi-simple GWM<sup>c</sup> (Corollary 3). While the problem of minimizing a GWM defined over the simple family of circular strings is central to this paper, we see it as a test bed for developing the theory of general GWMs: beyond the minimization and equivalence algorithms we propose, we believe that one of our main contributions is to illustrate how the theory of GWMs will rely on advanced concepts from algebra theory and to unravel fundamental properties that will surely be central to the study of GWMs defined over more general families of graphs (such as the one of semi-simple GWM<sup>c</sup>).

#### **1.1 Notations**

For any integer <sup>n</sup> we let [n] = {1, <sup>2</sup>, ··· , n}. We denote the set of integers by <sup>N</sup> and the fields of real and rational numbers by R and Q respectively. Let F be a field of characteristic 0, we denote by <sup>M</sup>n(F) = <sup>F</sup><sup>n</sup>×<sup>n</sup> the set of all <sup>n</sup>×<sup>n</sup> matrices over <sup>F</sup>. We use lower case bold letters for vectors (e.g. **<sup>v</sup>** <sup>∈</sup> <sup>F</sup><sup>d</sup><sup>1</sup> ) and upper case bold letters for matrices (e.g. **<sup>M</sup>** <sup>∈</sup> <sup>F</sup><sup>d</sup>1×d<sup>2</sup> ). We denote by **<sup>I</sup>**<sup>n</sup> the <sup>n</sup> <sup>×</sup> <sup>n</sup> identity matrix (or simply **I** if the dimension is clear from context). Given a matrix **M** ∈ <sup>F</sup><sup>d</sup>1×d<sup>2</sup> , we denote its entries by **<sup>M</sup>**i,j and we use vec(**M**) <sup>∈</sup> <sup>F</sup><sup>d</sup>1d<sup>2</sup> to denote the column vector obtained by concatenating the columns of **M**. We use ker(**A**) to denote the kernel (or null space) of a matrix **<sup>A</sup>**. Given two matrices **<sup>A</sup>** ∈ Mm(F) and **<sup>B</sup>** ∈ Mn(F) we denote their Kronecker product by **<sup>A</sup>** <sup>⊗</sup> **<sup>B</sup>** ∈ Mmn(F) and their direct sum by **<sup>A</sup>** <sup>⊕</sup> **<sup>B</sup>** ∈ Mm+n(F): **<sup>A</sup>** <sup>⊗</sup> **<sup>B</sup>** is the block matrix with blocks (**A**i,j**B**)i,j and **A**⊕ **B** is the block diagonal matrix with **A** in the upper diagonal block and **B** in the lower one. We denote by Σ<sup>∗</sup> the set of strings on a finite alphabet Σ and the empty string by λ. We denote by Σ<sup>+</sup> the set of non-empty strings and by Σ<sup>k</sup> the set of all strings of length k.

### **2 Preliminaries**

We first present notions on weighted automata, graph weighted models and finite dimensional algebras. The reader is referred to [9,16,25] for more details on weighted automata theory, to [2] and [22, Chap. 2] for an introduction to graph weighted models, and to [13,17] for a thorough introduction to finite dimensional algebras.

#### **2.1 Weighted Automata and GWMs over Circular Strings**

Let Σ be a finite alphabet. A *weighted finite automaton* (WFA) over a field F with <sup>n</sup> states is a tuple <sup>M</sup> = (*α*, {**M**σ}σ∈Σ, *<sup>ω</sup>*) where *<sup>α</sup>*, *<sup>ω</sup>* <sup>∈</sup> <sup>F</sup><sup>n</sup> are the initial and final weight vectors respectively, and **<sup>M</sup>**<sup>σ</sup> ∈ Mn(F) is the transition matrix for each symbol <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>. A WFA computes a function <sup>f</sup><sup>M</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>F</sup> defined for each word x = x1x<sup>2</sup> ··· x<sup>k</sup> ∈ Σ<sup>∗</sup> by

$$f\_M(x) = \alpha^\top \mathbf{M}^{x\_1} \mathbf{M}^{x\_2} \cdots \mathbf{M}^{x\_k} \omega.$$

We will often use the shorthand notation **<sup>M</sup>**<sup>x</sup> <sup>=</sup> **<sup>M</sup>**<sup>x</sup>1**M**<sup>x</sup><sup>2</sup> ···**M**<sup>x</sup>*<sup>k</sup>* for any word x = x1x<sup>2</sup> ··· x<sup>k</sup> ∈ Σ∗. A WFA M with n states is *minimal* if its number of states is minimal, i.e. any WFA M such that f<sup>M</sup> = f<sup>M</sup> has at least n states. We say that a function <sup>f</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>R</sup> is *WFA-recognizable* if there exists a WFA computing it.

Graph weighted models (GWMs) have been introduced as a computational model over arbitrary labeled graphs and hypergraphs in [2]. In this paper, we focus on the simple model of GWMs defined over the family of circular strings. A *circular string* is a string without a beginning or an end, one can think of it as a string closed onto itself (see Fig. 1).

**Fig. 1.** (left) Graph representation of the string abba where the special vertices labeled with α and ω denote the beginning and end of the string respectively. (right) In contrast, the circular string abba has no beginning and no end, it is thus the same object as e.g. the circular string baab.

A d*-dimensional GWM* A *over circular strings* (GWM<sup>c</sup>) on Σ is given by a set of matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ Md(F). It computes a function <sup>f</sup><sup>A</sup> : <sup>Σ</sup><sup>+</sup> <sup>→</sup> <sup>F</sup> defined<sup>2</sup> for each word <sup>x</sup> <sup>=</sup> <sup>x</sup>1x<sup>2</sup> ··· <sup>x</sup><sup>k</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> by

$$f\_A(x) = \text{Tr}(\mathbf{A}^{x\_1}\mathbf{A}^{x\_2}\cdots\mathbf{A}^{x\_k}) = \text{Tr}(\mathbf{A}^x).$$

By invariance of the trace under cyclic permutation, we have fA(x1x<sup>2</sup> ··· xk) = fA(x2x<sup>3</sup> ··· xkx1) = fA(x3x<sup>4</sup> ··· xkx1x2) = ··· . This is in accordance with the

<sup>2</sup> Observe that we exclude the empty string from the domain of f<sup>A</sup>. This is on purpose since f<sup>A</sup>(λ) would be the dimension of <sup>A</sup> (using the convention **<sup>A</sup>**<sup>λ</sup> <sup>=</sup> **<sup>I</sup>**): given two GWM<sup>c</sup> s of different dimensions computing the same function on Σ<sup>+</sup>, we want to consider them as equivalent even though they disagree on λ.

definition of a circular string: for any string x obtained by cyclic permutation of the letters of a string x, both x and x correspond to the same circular string. Similarly to WFAs, a GWM<sup>c</sup> is *minimal* if its dimension is minimal and a function <sup>f</sup> : <sup>Σ</sup><sup>+</sup> <sup>→</sup> F is GWMc-recognizable if it can be computed by a GWM<sup>c</sup>.

It is immediate to see that there exist WFA-recognizable functions that are not GWM<sup>c</sup>-recognizable, this is the case of any WFA-recognizable function that is not invariant under cyclic permutation of letters in a word<sup>3</sup>. In contrast, one can easily show that any GWM<sup>c</sup>-recognizable function is WFA-recognizable. More precisely, we have the following result.

**Proposition 1.** *For any* <sup>d</sup>*-dimensional GWM*c<sup>A</sup> <sup>=</sup> {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> *on* <sup>Σ</sup>*, the WFA* <sup>M</sup> *with* <sup>d</sup><sup>2</sup> *states* (*α*, {**M**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>, *<sup>ω</sup>*) *where <sup>α</sup>* <sup>=</sup> *<sup>ω</sup>* = vec(**I**d) *and* **<sup>M</sup>**<sup>σ</sup> <sup>=</sup> **<sup>I</sup>**<sup>d</sup> <sup>⊗</sup> **<sup>A</sup>**<sup>σ</sup> *for each* σ ∈ Σ*, is such that* fM(x) = fA(x) *for all* x ∈ Σ∗*.*

*Proof.* For any <sup>w</sup> <sup>=</sup> <sup>w</sup><sup>1</sup> ··· <sup>w</sup><sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> we have <sup>f</sup>A(w) = Tr(**A**<sup>w</sup>) = - <sup>i</sup>∈[d] **<sup>A</sup>**<sup>w</sup> i,i = - <sup>i</sup>∈[d] **<sup>e</sup>** <sup>i</sup> **A**<sup>w</sup>**e**<sup>i</sup> where **e**<sup>i</sup> is the i-th vector of the canonical basis of F<sup>d</sup>. Since *α* = *ω* = (**e** <sup>1</sup> , ··· , **e** <sup>d</sup> ) and **<sup>M</sup>**<sup>σ</sup> <sup>=</sup> **<sup>I</sup>** <sup>⊗</sup> **<sup>A</sup>**<sup>σ</sup> is the block-diagonal matrix with **A**<sup>σ</sup> repeated d times on the diagonal, one can check that fM(w) = *α***M**<sup>w</sup> - *ω* = <sup>i</sup>∈[d] **<sup>e</sup>** <sup>i</sup> **<sup>A</sup>**<sup>w</sup>**e**<sup>i</sup> <sup>=</sup> <sup>f</sup>A(w).

It follows from this proposition that the learning and equivalence problems for GWM<sup>c</sup>s could be handled by using the corresponding algorithms for WFAs. We will nonetheless study the equivalence problem in the next section<sup>4</sup> without falling back onto the theory of WFAs, which will allow us to unravel fundamental properties of GWMs that will be particularly relevant to further studies (moreover, the minimization problem obviously cannot be handled in such a way).

#### **2.2 Finite-Dimensional Algebras**

An *algebra* <sup>A</sup> over a field <sup>F</sup> (or <sup>F</sup>-algebra) is a vector space over the field <sup>F</sup> equipped with a bilinear operation (called multiplication or product). An algebra is *associative* if its product is associative and it is *finite-dimensional* if it is of finite dimension as a vector space over F. *In this paper, we will only consider finite-dimensional associative algebras*. A sub-algebra B of an algebra A is a linear subspace of A which is closed under product (i.e. B equipped with the operations of A is an algebra itself).

A classical example of finite-dimensional algebra is the set L(V ) of linear operators on some finite-dimensional vector space V (where the product is composition). In this particular example, the algebra L(V ) is isomorphic to the *full matrix algebra* <sup>M</sup>d(F), where <sup>d</sup> is the dimension of <sup>V</sup> ; we will mainly focus on matrix algebras in this paper, i.e. sub-algebras of the full matrix algebra <sup>M</sup>d(F) for some d (an example of such an algebra is the set of d × d upper triangular matrices). In particular, we will often consider the *algebra generated by a finite*

<sup>3</sup> Note that this is not a necessary condition: the function f defined on {a, b}<sup>∗</sup> by f(x) = 1 if x <sup>=</sup> a and 0 otherwise is WFA-recognizable but not GWM<sup>c</sup> -recognizable.

<sup>4</sup> The learning problem has been previously considered in [5,22].

*set of matrices* {**A**σ}σ∈<sup>Σ</sup> ⊂ Md(F) for some finite alphabet <sup>Σ</sup>, that is the set of all finite linear combinations of matrices of the form **<sup>A</sup>**<sup>x</sup> <sup>=</sup> **<sup>A</sup>**x1**A**x<sup>2</sup> ··· **<sup>A</sup>**x*<sup>k</sup>* for x = x1x<sup>2</sup> ··· x<sup>k</sup> ∈ Σ∗. More formally, if we denote by A this algebra, we have

$$\mathcal{A} = \left\{ \sum\_{i=1}^{n} \alpha\_i \mathbf{A}^{w\_i} : n \in \mathbb{N}, \ \alpha\_1, \dots, \alpha\_n \in \mathbb{F}, \ w\_1, \dots, w\_n \in \Sigma^\* \right\}. \ .$$

Let <sup>A</sup> be a finite-dimensional algebra over <sup>F</sup>. A sub-algebra <sup>X</sup> of <sup>A</sup> is called an *ideal of* A if both xa ∈ X and ax ∈ X for any x ∈ X , a ∈ A (i.e. X is both left and right A-invariant), which we will denote by AX = X A = A. A sub-algebra <sup>X</sup> of <sup>A</sup> is *nilpotent* if there exists some integer <sup>k</sup> such that <sup>X</sup> <sup>k</sup> <sup>=</sup> {x1x<sup>2</sup> ··· x<sup>k</sup> : x<sup>i</sup> ∈ X , i ∈ [k]} = {0}. The *factor algebra* A/X of an algebra A by an ideal X is the algebra consisting of all cosets a + X for a ∈ A, in other words A/X is the quotient of A by the equivalence relation (a ∼ b if and only if <sup>a</sup> <sup>−</sup> <sup>b</sup> ∈ X ). The *radical* <sup>5</sup> *of* <sup>A</sup> is the maximal nilpotent ideal of <sup>A</sup> and will be denoted by Rad(A) (the existence of Rad(A) follows from the fact that A is of finite dimension). An algebra A is *semi-simple* if its radical is {0}.

Let us illustrate these definitions with a very simple example. Let G⊂M2(R) be the algebra generated by the matrix **G** = 1 1 0 1 . One can easily check that

$$\mathcal{G} = \left\{ \begin{bmatrix} \alpha \ \beta \\ 0 \ \alpha \end{bmatrix} : \alpha, \ \beta \in \mathbb{R} \right\} \text{ and is thus of dimension 2. Consequently, both}$$

$$\mathcal{G}\_1 = \left\{ \begin{bmatrix} \alpha \ 0 \\ 0 \ \alpha \end{bmatrix} : \ \alpha \in \mathbb{R} \right\} \quad \text{and} \quad \mathcal{G}\_2 = \left\{ \begin{bmatrix} 0 \ \beta \\ 0 \ 0 \end{bmatrix} : \ \beta \in \mathbb{R} \right\} \tag{1}$$

are sub-algebras of G. Moreover, G<sup>2</sup> is a nilpotent ideal and one can check that it is maximal, i.e. Rad(G) = G<sup>2</sup> and hence G is not semi-simple.

Intuitively, the radical of an algebra A contains its *bad elements* (in the sense that these elements annihilate all simple A-modules). In our previous example, this *bad* property translates into the fact that the non-zero elements of G<sup>2</sup> cannot be diagonalized. We will use two fundamental results from the theory of finite dimensional algebra. The first one is the Wedderburn-Malcev theorem which states that (under some conditions on the ground field F) the elements of the radical can be *filtered out* from the algebra, i.e. one can find a sub-algebra of A that is isomorphic to A/Rad(A) (see e.g. [17, Theorem 6.2.3]).

**Theorem 1 (Wedderburn-Malcev Theorem).** *Let* A *be a finite-dimensional algebra* over a field of characteristic0*. There exists a semi-simple subalgebra* <sup>A</sup>˜ *of* <sup>A</sup> *which is isomorphic to* <sup>A</sup>/Rad(A) *and such that* <sup>A</sup> <sup>=</sup> A ⊕˜ Rad(A) *(direct sum of vector spaces).*

Going back to the example of the algebra G described above, we showed that it is not semi-simple, however one can easily check that G/Rad(G) is isomorphic to the algebra G<sup>1</sup> in Eq. (1) which is semi-simple, and furthermore that G = G1⊕Rad(G).

<sup>5</sup> Note that this definition is specific to the finite-dimensional case; for general rings, there exist distinct non-equivalent definitions of radicals, which all agree with the one given here in the case of finite-dimensional algebras.

The second fundamental result we will need is related to the notion of representation of an algebra. A *representation* of an <sup>F</sup>-algebra <sup>A</sup> is a homomorphism of A into the algebra L(V ) of the linear operators on some vector space V (over <sup>F</sup>). Two representations <sup>ρ</sup>: A→L(<sup>V</sup> ) and <sup>τ</sup> : A→L(W) are *similar* if there exists an isomorphism <sup>φ</sup>: <sup>V</sup> <sup>→</sup> <sup>W</sup> such that <sup>ρ</sup>(a) = <sup>φ</sup>−<sup>1</sup><sup>τ</sup> (a)<sup>φ</sup> for all <sup>a</sup> ∈ A. For semi-simple algebras, the notion of similar representations is fundamentally related to the trace operator, which will be particularly relevant to the present study. Formally, we have the following theorem (see e.g. [17, Corollary 2.6.3]).

**Theorem 2.** *Let* ρ *and* τ *be two representations of a* semi-simple *algebra* A *over a* field of characteristic 0*. These representations are similar if and only if* Tr(ρ(a)) = Tr(τ (a)) *for all* a ∈ A*.*

### **3 Semi-Simple GWMs and the Equivalence Problem**

In this section, we study the equivalence problem: given two GWMs over circular strings, how can we decide whether they compute the same function? In light of Proposition 1, one could solve this problem by simply *converting* the two GWM<sup>c</sup>s into WFAs and checking whether these two WFAs compute the same function; indeed the equivalence problem for WFAs defined over a field is decidable in polynomial time [9]. Nonetheless, we will tackle this problem without relying on this proposition and, by doing so, we will unravel the notion of *semi-simple* GWM<sup>c</sup> which will be relevant to the study of the minimization problem in the next section (and which should also be central to the study of GWMs defined over more general families of graphs).

#### **3.1 Semi-Simplicity, Nilpotent Matrices and Traces**

Let A be a finite dimensional matrix algebra. Recall that the radical of A is its maximal nilpotent ideal. A useful characterization of the elements of the radical relies on the notion of strongly nilpotent elements: **A** ∈ A is *strongly nilpotent* if **AX** is nilpotent for any **X** ∈ A. It turns out that the radical of A is exactly the set of its strongly nilpotent elements [17, Corollary 3.1.10]. Since the computation of a GWM<sup>c</sup> boils down to applying the trace operator, we will leverage this property to relate the notions of radical and semi-simplicity to simple properties of the elements of A with respect to the trace operator. We start with a simple lemma relating nilpotency and trace.

**Lemma 1.** *Let* <sup>F</sup> *be a field* of characteristic 0 *and let* **<sup>A</sup>** ∈ Md(F)*. Then* **<sup>A</sup>** *is nilpotent if and only if* Tr(**A**<sup>n</sup>)=0 *for all* <sup>n</sup> <sup>≥</sup> <sup>1</sup>*.*

*Proof.* Let **A** be a nilpotent matrix and let k be such that **A**<sup>k</sup> = 0. Suppose **Av** = γ**v** for some **v** = **0** (where γ could belong to an algebraically closed field extension of F). Then **A**<sup>k</sup>**v** = γ<sup>k</sup>**v** = 0 hence γ = 0 since F is of characteristic 0, thus **<sup>A</sup>** has only 0 eigenvalues and Tr(**A**<sup>n</sup>) = 0 for all <sup>n</sup> <sup>≥</sup> 1.

Conversely, suppose that Tr(**A**<sup>n</sup>) = 0 for all <sup>n</sup> <sup>≥</sup> 1. Then, we have Tr(P(**A**)) = 0 for any polynomial P with constant term 0. Suppose that **A** has a non-zero eigenvalue γ and let m > 0 be its multiplicity. Choose a polynomial P such that P(γ) = 1, P(0) = 0 and P(μ) = 0 for any eigenvalue μ of **A** distinct from γ. We then have 0 = Tr(P(**A**)) = m, a contradiction. Hence **A** has only zero eigenvalues and is nilpotent.

One can use the previous lemma to show that an element **A** ∈ A is strongly nilpotent if and only if Tr(**AX**) = 0 for all **X** ∈ A, which leads to the following useful characterization of the semi-simplicity of an algebra.

**Proposition 2.** *Let* A⊂Md(F) *be a matrix algebra. We have*

$$\text{Rad}(\mathcal{A}) = \{ \mathbf{A} \in \mathcal{A} \; : \; \text{Tr}(\mathbf{A} \mathbf{X}) = 0 \; \; \text{for all } \mathbf{X} \in \mathcal{A} \}\; .$$

*Consequently,* A *is semi-simple if and only if for all* **A** ∈ A *different from* 0 *there exists* **X** ∈ A *such that* Tr(**AX**) = 0*.*

*Proof.* We will show that **A** ∈ A is strongly nilpotent if and only if Tr(**AX**)=0 for all **X** ∈ A. The proposition will then directly follows from the fact that Rad(A) is the set of strongly nilpotent elements of A and from the fact that A is semi-simple if and only if Rad(A) = {0}.

Let **<sup>A</sup>** ∈ A be such that Tr(**AX**) = 0 for all **<sup>X</sup>** ∈ A. Since **<sup>X</sup>**(**AX**)<sup>n</sup>−<sup>1</sup> ∈ A for all <sup>n</sup> <sup>≥</sup> 1 and all **<sup>X</sup>** ∈ A we have Tr((**AX**)<sup>n</sup>) = 0 for all <sup>n</sup> <sup>≥</sup> 1 and all **<sup>X</sup>** ∈ A, hence **AX** is nilpotent for all **X** ∈ A by Lemma 1, i.e. **A** is strongly nilpotent. Conversely, let **A** be a strongly nilpotent element of A. By Lemma 1 we have Tr((**AX**)<sup>n</sup>) = 0 for all **<sup>X</sup>** ∈ A and all <sup>n</sup> <sup>≥</sup> 1, in particular Tr(**AX**) = 0.

#### **3.2 Equivalence of GWMs**

We now consider the problem of deciding whether two GWM<sup>c</sup>s are equivalent. Let us first briefly show how one can decide whether two real-valued WFAs compute the same function. One way to address this problem relies on the following result: two minimal real-valued WFAs computing the same function are related by a change of basis. Note that it is easy to check that WFAs are invariant under a change of basis of their weight vectors and transition matrices. The following proposition show that such a change of basis is actually the only way for two minimal WFAs to compute the same function [26] (see also [6, Corollary 4.2]).

**Proposition 3.** *If two WFAs* <sup>A</sup> = (*α*, {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>, *<sup>ω</sup>*) *and* <sup>A</sup>˜ = (*α*˜ , {**A**˜ <sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>, *<sup>ω</sup>*˜ ) *with* d *states taking their values in* R *are minimal and compute the same function, i.e.* <sup>f</sup><sup>A</sup> <sup>=</sup> <sup>f</sup>A˜*, then there exists an invertible matrix* **<sup>P</sup>** ∈ Md(R) *such that*

$$\boldsymbol{\alpha}^{\top} = \boldsymbol{\tilde{\alpha}}^{\top} \mathbf{P}, \quad \boldsymbol{\omega} = \mathbf{P}^{-1} \boldsymbol{\tilde{\omega}} \quad \text{and} \ \mathbf{A}^{\sigma} = \mathbf{P}^{-1} \boldsymbol{\tilde{\Lambda}}^{\sigma} \mathbf{P} \text{ for each } \sigma \in \Sigma.$$

Hence, to decide whether two WFAs compute the same function one can simply minimize them and check whether the weight vectors and transition matrices obtained after minimization are related by a change of basis (which can both be done in polynomial time). In contrast, one can easily find an example of two minimal GWM<sup>c</sup>s whose matrices are not related by a change of basis. Consider the constant function <sup>f</sup>(x) = 2 for all <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>. One can check that the two GWM<sup>c</sup>s G and G˜ with 2 states defined by the matrices

$$\mathbf{G} = \begin{bmatrix} 1 \ 1 \\ 0 \ 1 \end{bmatrix} \quad \text{and} \quad \mathbf{\tilde{G}} = \begin{bmatrix} 1 \ 0 \\ 0 \ 1 \end{bmatrix}$$

respectively are minimal and compute f, however **G** and **G**˜ are not similar.

Let us now introduce the notion of *semi-simple*GWMc. We say that a GWM<sup>c</sup> <sup>A</sup> defined by a set of matrices {**A**σ}σ∈<sup>Σ</sup> ⊂ Md(F) is *semi-simple* if the algebra <sup>A</sup> generated by the matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> is semi-simple. It follows from the example presented in Sect. 2.2 that G is not semi-simple while G˜ is a semi-simple GWM<sup>c</sup> computing the GWM<sup>c</sup>-recognizablefunction f. We will now show that this simple example can be generalized:*any* GWM<sup>c</sup>-recognizable *function can be computed by a semi-simple* GWM<sup>c</sup>. This non-trivial result relies on the following theorem which is a direct consequence of the Wedderburn-Malcev theorem.

**Theorem 3.** *Let* A⊂Md(F) *be a matrix algebra over a field of characteristic* <sup>0</sup>*. Then there exist a* semi-simple *sub-algebra* <sup>A</sup>˜ *of* <sup>A</sup> *and a surjective homomorphism* <sup>π</sup> : A → <sup>A</sup>˜ *such that* Tr(**A**) = Tr(π(**A**)) *for all* **<sup>A</sup>** ∈ A*.*

*Proof.* By Theorem <sup>1</sup> there exists a semi-simple sub-algebra <sup>A</sup>˜ of <sup>A</sup> which is isomorphic to <sup>A</sup>/Rad(A) and such that <sup>A</sup> <sup>=</sup> A ⊕˜ Rad(A) (direct sum of vector spaces). Let <sup>π</sup> : A → <sup>A</sup>˜ be the projection associated with this direct sum. Then for any **A** ∈ A we have

$$\operatorname{Tr}(\mathbf{A}) = \operatorname{Tr}(\pi(\mathbf{A}) + (1 - \pi)(\mathbf{A})) = \operatorname{Tr}(\pi(\mathbf{A})) + \operatorname{Tr}((1 - \pi)(\mathbf{A})) = \operatorname{Tr}(\pi(\mathbf{A})).$$

Indeed, since (1 − π)(**A**) ∈ Rad(A), it is nilpotent, hence its trace is zero.

Using the notations from Theorem 3, it follows that for any d-dimensional GWM<sup>c</sup> <sup>A</sup> given by a set of matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ Md(F) generating the algebra <sup>A</sup>, the <sup>d</sup>dimensional <sup>A</sup>˜ given by the matrices {**A**˜ <sup>σ</sup> <sup>=</sup> <sup>π</sup>(**A**<sup>σ</sup>)}<sup>σ</sup>∈<sup>Σ</sup> is a semi-simple GWM<sup>c</sup> computing the function fA, hence the following corollary.

**Corollary 1.** *Any function that can be computed by a GWM*<sup>c</sup> *can be computed by a semi-simple GWM*<sup>c</sup> *of the same dimension.*

Given a finite dimensional algebra A, one can compute the surjective homomorphism π from Theorem 3 in polynomial time when F allows efficient arithmetic computations (e.g. F = Q) [12,15]. The algorithm takes as input a basis a1, ··· , a<sup>n</sup> of A (as a vector space) and the structure coefficients of the algebra (which are the scalars c<sup>k</sup> i,j <sup>∈</sup> <sup>F</sup> satisfying <sup>a</sup>ia<sup>j</sup> <sup>=</sup> - <sup>k</sup> c<sup>k</sup> i,jak). Since one can easily compute a basis and the structure coefficients of a matrix algebra A given a set of generators {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> in polynomial time, it follows that any GWM<sup>c</sup> can be transformed in polynomial time into a semi-simple GWM<sup>c</sup> (of the same dimension) computing the same function.

We now show that a result similar to Proposition 3 holds for semi-simple GWM<sup>c</sup>s: *two semi-simple* d*-dimensional* GWM<sup>c</sup>s are equivalent if and only if they are related by a change of basis. This result relies on the following theorem. **Theorem 4.** *Let* <sup>Σ</sup> *be a finite alphabet and let* <sup>A</sup>, B⊂Md(F) *be the algebras generated by the sets of matrices* {**A**σ}σ∈<sup>Σ</sup> *and* {**B**σ}σ∈<sup>Σ</sup> *respectively.*

*If* <sup>A</sup> *and* <sup>B</sup> *are semi-simple and* Tr(**A**w) = Tr(**B**w) *for all* <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> *then* <sup>A</sup> *is isomorphic to* <sup>B</sup>*. Moreover, the mapping* <sup>φ</sup>˜: A→B *defined by extending the mapping*

$$
\phi \colon \mathbf{A}^x \mapsto \mathbf{B}^x \quad \text{for all } x \in \Sigma^\*
$$

*by linearity is well-defined and is an isomorphism.*

*Proof.* The mapping φ is by construction a trace-preserving surjective semigroup homomorphism. We first show<sup>6</sup> that φ can be extended to a homomorphism <sup>φ</sup>˜: A→B. By definition, any **<sup>A</sup>** ∈ A can be written as **<sup>A</sup>** <sup>=</sup> n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>α</sup>1, ··· , α<sup>n</sup> <sup>∈</sup> <sup>F</sup>, <sup>x</sup>1, ··· , x<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>∗. We will show that the mapping

$$\tilde{\phi} \colon \sum\_{i=1}^n \alpha\_i \mathbf{A}^{x\_i} \longmapsto \sum\_{i=1}^n \alpha\_i \phi(\mathbf{A}^{x\_i})$$

is well-defined. By construction of φ˜, it suffices to show that if n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* = 0 for some <sup>α</sup><sup>i</sup> <sup>∈</sup> <sup>F</sup>, <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>∗, then <sup>φ</sup>˜( n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* ) = 0. Suppose n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* = 0, then n <sup>i</sup>=1 <sup>α</sup>i**A**<sup>x</sup>*i***A**<sup>x</sup> = 0 for any <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>∗. By linearity of the trace and since <sup>φ</sup> is a trace-preserving morphism, it follows that

$$\begin{aligned} 0 &= \sum\_{i=1}^{n} \alpha\_i \text{Tr} \left[ \mathbf{A}^{x\_i} \mathbf{A}^x \right] = \sum\_{i=1}^{n} \alpha\_i \text{Tr} \left[ \phi(\mathbf{A}^{x\_i} \mathbf{A}^x) \right] = \sum\_{i=1}^{n} \alpha\_i \text{Tr} \left[ \phi(\mathbf{A}^{x\_i}) \phi(\mathbf{A}^x) \right] \\ &= \text{Tr} \left[ \left( \sum\_{i=1}^{n} \alpha\_i \phi(\mathbf{A}^{x\_i}) \right) \phi(\mathbf{A}^x) \right] = \text{Tr} \left[ \tilde{\phi} \left( \sum\_{i=1}^{n} \alpha\_i \mathbf{A}^{x\_i} \right) \phi(\mathbf{A}^x) \right] \end{aligned}$$

for all x ∈ Σ∗. By linearity of the trace and since φ is surjective, we thus have Tr φ˜ ( n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* ) **B** = 0 for any **<sup>B</sup>** ∈ B, hence <sup>φ</sup>˜ ( n <sup>i</sup>=1 αi**A**<sup>x</sup>*<sup>i</sup>* ) belongs to Rad(B) by Proposition 2 and must be 0 since B is semi-simple.

One can easily check that φ˜ is trace-preserving, is surjective and is a homomorphism. It remains to show that <sup>φ</sup>˜ is injective. Let **<sup>A</sup>** ∈ A be such that <sup>φ</sup>˜(**A**) = 0. Since <sup>φ</sup>˜ is a homomorphism we have <sup>φ</sup>˜(**AX**) = 0 for any **<sup>X</sup>** ∈ A, and thus 0 = Tr(φ˜(**AX**)) = Tr(**AX**) for all **<sup>X</sup>** ∈ A. Hence **<sup>A</sup>** <sup>∈</sup> Rad(A) by Proposition 2 and must be 0 since A is semi-simple.

The previous theorem can be leveraged to show that if two semi-simple GWM<sup>c</sup>s of the same dimension compute the same function, then they are related by a change of basis (note that the converse of this statement is immediate since the trace is a basis independent operator). Let A and B be two d-dimensional semi-simple GWM<sup>c</sup>s computing the same function and let <sup>A</sup>, B⊂M<sup>d</sup> be the algebras generated by their respective sets of matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> and {**B**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>. First observe that the identity mapping <sup>ρ</sup>: A→L(F<sup>d</sup>) defined by <sup>ρ</sup>(**A**) = **<sup>A</sup>** for all **A** ∈ A is (trivially) a representation of the algebra A. Now, since A and B

<sup>6</sup> This part of the proof is adapted from the proof of Proposition 3.1 in [18].

compute the same function and are semi-simple, we have Tr(**A**w) = Tr(**B**w) for all <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and it follows from Theorem <sup>4</sup> that <sup>A</sup> is isomorphic to <sup>B</sup>; let <sup>φ</sup>˜: A→B be the isomorphism defined in this theorem. Then, the mapping <sup>τ</sup> : A→L(Fd) defined by <sup>τ</sup> (**A**) = <sup>φ</sup>˜(**A**) for all **<sup>A</sup>** ∈ A is also a representation of <sup>A</sup>, and since A is semi-simple it follows from Theorem 2 that ρ and τ are similar. That is, there exists an invertible matrix **<sup>P</sup>** ∈ Md(F) such that <sup>ρ</sup>(**A**) = **<sup>P</sup>**−<sup>1</sup><sup>τ</sup> (**A**)**<sup>P</sup>** for all **A** ∈ A. In particular we have

$$\mathbf{A}^{\sigma} = \rho(\mathbf{A}^{\sigma}) = \mathbf{P}^{-1}\tau(\mathbf{A}^{\sigma})\mathbf{P} = \mathbf{P}^{-1}\tilde{\phi}(\mathbf{A}^{\sigma})\mathbf{P} = \mathbf{P}^{-1}\mathbf{B}^{\sigma}\mathbf{P}$$

for all σ ∈ Σ, hence the following corollary.

**Corollary 2.** *Two* d*-dimensional semi-simple GWM*c*s* A *and* B *compute the same function if and only if they are related by a change of basis, i.e. there exists an invertible matrix* **<sup>P</sup>** ∈ Md(F) *such that* **<sup>A</sup>**<sup>σ</sup> <sup>=</sup> **<sup>P</sup>**−<sup>1</sup>**B**<sup>σ</sup>**<sup>P</sup>** *for all* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>*.*

In the case where F allows for efficient arithmetic computations (e.g. F = Q), it follows that the equivalence of GWM<sup>c</sup>s can be decided in polynomial time. Indeed, given two GWM<sup>c</sup>s A and B of the same dimension defined by the matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> and {**B**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> respectively, one can first transform them into semisimple GWM<sup>c</sup>s using Theorem 3 and the algorithm in [12,15], and then check whether the resulting matrices are related by a change of basis. The case where the two GWM<sup>c</sup>s are not of the same dimension can be easily handled. Without loss of generality, suppose that A and B are semi-simple GWM<sup>c</sup>s of dimension d and d respectively with d < d. One can construct a d-dimensional GWM<sup>c</sup> B˜ computing the same function as B by considering the block-diagonal matrices **<sup>B</sup>**˜ <sup>σ</sup> <sup>=</sup> **<sup>B</sup>**<sup>σ</sup> <sup>⊕</sup> **<sup>0</sup>** for each <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> (where **<sup>0</sup>** is the (<sup>d</sup> <sup>−</sup> <sup>d</sup> ) × (d − d ) matrix with all entries equal to 0). It is easy to check that B˜ is semi-simple if B is semi-simple, hence one can decide if A is equivalent to B by checking whether the matrices **A**<sup>σ</sup> and **B**˜ <sup>σ</sup> are related by a change of basis.

### **4 Minimization of GWMs over Circular Strings**

We now consider the minimization problem: given a GWM<sup>c</sup> A, can we find a minimal GWM<sup>c</sup> computing fA? We will show that the answer is in the positive and that such a minimal GWM<sup>c</sup> can be computed in polynomial time. We start with a technical lemma that generalizes the classical result stating that for any <sup>d</sup> <sup>×</sup> <sup>d</sup> matrix **<sup>A</sup>**, the kernel of **<sup>A</sup>**<sup>d</sup> is equal to the kernel of **<sup>A</sup>**<sup>d</sup>+<sup>k</sup> for any <sup>k</sup> <sup>≥</sup> 0.

**Lemma 2.** *Let* {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ Md(F) *be a finite set of matrices. Then for all* k ≥ 0 *we have*

$$\bigcap\_{x \in \Sigma^d} \ker(\mathbf{A}^x) = \bigcap\_{y \in \Sigma^{d+k}} \ker(\mathbf{A}^y).$$

*Proof.* For any integer i, let E<sup>i</sup> = <sup>x</sup>∈Σ*<sup>i</sup>* ker(**A**<sup>x</sup>). We start by showing that if E<sup>i</sup> = Ei+1 for some i then Ei+1 = Ei+2. The inclusion Ei+1 ⊆ Ei+2 is immediate. Suppose <sup>E</sup><sup>i</sup> <sup>=</sup> <sup>E</sup>i+1 for some integer <sup>i</sup>. If **<sup>v</sup>** <sup>∈</sup> <sup>E</sup>i+2 then **<sup>A</sup>**<sup>σ</sup>**<sup>v</sup>** <sup>∈</sup> ker(**A**x) for all <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>i+1 and all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, i.e. **<sup>A</sup>**σ**<sup>v</sup>** <sup>∈</sup> <sup>E</sup>i+1 <sup>=</sup> <sup>E</sup><sup>i</sup> for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, which implies **<sup>A</sup>**σ**<sup>v</sup>** <sup>∈</sup> ker(**A**y) for all <sup>y</sup> <sup>∈</sup> <sup>Σ</sup><sup>i</sup> and all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> from which **<sup>v</sup>** <sup>∈</sup> <sup>E</sup>i+1 follows directly. To conclude, since each E<sup>i</sup> is a linear subspace of Fd, E<sup>i</sup> - Ei+1 implies dim E<sup>i</sup> < dim Ei+1, hence there must exist an i for which E<sup>i</sup> = Ei+1 and this i cannot be greater than d.

We show in the following theorem that the linear space E = <sup>x</sup>∈Σ*<sup>d</sup>* ker(**A**x) is not relevant to the computation of a GWM<sup>c</sup> <sup>A</sup> with matrices {**A**σ}σ∈Σ, i.e. one can project each matrix **A**<sup>x</sup> onto the orthogonal complement of E without changing the function computed by A.

**Theorem 5.** *Let* <sup>A</sup> *be a GWM*<sup>c</sup> *given by the set of matrices* {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> <sup>⊂</sup> <sup>M</sup>d(F)*. Consider the linear space*

$$E = \bigcap\_{x \in \Sigma^d} \ker(\mathbf{A}^x) = \{ \mathbf{v} \in \mathbb{F}^d : \mathbf{A}^x \mathbf{v} = \mathbf{0} \text{ for all } x \in \Sigma^d \}$$

*and let* **<sup>Π</sup>** <sup>∈</sup> <sup>F</sup><sup>d</sup>×<sup>d</sup> *be the matrix of the orthogonal projection onto* <sup>E</sup>*.*

*Then, the GWM*<sup>c</sup> <sup>A</sup><sup>ˆ</sup> *given by the matrices* **<sup>A</sup>**<sup>ˆ</sup> <sup>σ</sup> <sup>=</sup> **<sup>A</sup>**<sup>σ</sup>(**<sup>I</sup>** <sup>−</sup> **<sup>Π</sup>**) *for each* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> *is such that* f<sup>A</sup> = fAˆ*.*

*Proof.* Let <sup>A</sup> be the algebra generated by the matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>. Let us first observe that E is A-invariant, which follows from Lemma 2. Indeed, if **v** ∈ E and <sup>y</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> we have **<sup>A</sup>**<sup>x</sup>**A**<sup>y</sup>**<sup>v</sup>** <sup>=</sup> **<sup>0</sup>** for any <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>d</sup> (since <sup>|</sup>xy| ≥ <sup>d</sup>), hence **<sup>A</sup>**<sup>y</sup>**<sup>v</sup>** <sup>∈</sup> <sup>E</sup>; the extension to an arbitrary element of **A** is immediate by linearity. This implies that for any **A** ∈ A, we have

$$\mathbf{II}\mathbf{A}\mathbf{II}=\mathbf{A}\mathbf{II} \qquad \text{and} \quad (\mathbf{I}-\mathbf{II})\mathbf{A}\mathbf{II}=\mathbf{0}.\tag{2}$$

Now, let <sup>k</sup> <sup>≥</sup> 1, let <sup>x</sup> <sup>=</sup> <sup>x</sup>1x<sup>2</sup> ··· <sup>x</sup><sup>k</sup> <sup>∈</sup> <sup>Σ</sup><sup>k</sup> and let **<sup>P</sup>**<sup>1</sup> <sup>=</sup> **<sup>Π</sup>** and **<sup>P</sup>**<sup>2</sup> <sup>=</sup> **<sup>I</sup>** <sup>−</sup> **<sup>Π</sup>**. We can decompose **A**<sup>x</sup> into

$$\begin{split} \mathbf{A}^{x} &= \prod\_{i=1}^{k} \mathbf{A}^{x\_{i}} = \prod\_{i=1}^{k} \mathbf{A}^{x\_{i}} (\mathbf{P}\_{1} + \mathbf{P}\_{2}) = \sum\_{j\_{1}, \cdots, j\_{k} \in \{1, 2\}} \mathbf{A}^{x\_{1}} \mathbf{P}\_{j\_{1}} \mathbf{A}^{x\_{2}} \mathbf{P}\_{j\_{2}} \cdots \mathbf{A}^{x\_{k}} \mathbf{P}\_{j\_{k}} \\ &= \mathbf{\hat{A}}^{x} + \mathbf{A}^{x\_{1}} \mathbf{III} \mathbf{A}^{x\_{2}} \mathbf{II} \cdots \mathbf{A}^{x\_{k}} \mathbf{II} + \sum\_{\substack{j\_{1}, \cdots, j\_{k} \in \{1, 2\} \text{ s.t.} \\ \exists r, r'; r \neq j\_{r'}}} \mathbf{A}^{x\_{1}} \mathbf{P}\_{j\_{1}} \mathbf{A}^{x\_{2}} \mathbf{P}\_{j\_{2}} \cdots \mathbf{A}^{x\_{k}} \mathbf{P}\_{j\_{k}} . \end{split}$$

We will show that the traces of all the summands in this last expression, except for the first one, are equal to 0. First, using Eq. (2) we have **<sup>A</sup>**<sup>x</sup>1**ΠA**<sup>x</sup>2**<sup>Π</sup>** ··· **<sup>A</sup>**<sup>x</sup>*k***<sup>Π</sup>** <sup>=</sup> **<sup>A</sup>**<sup>x</sup>**Π**. Moreover, for any integer <sup>s</sup> such that sk <sup>≥</sup> <sup>d</sup> we have (**A**<sup>x</sup>**Π**)<sup>s</sup> = **A**<sup>x</sup>*<sup>s</sup>* **Π** = **0** by definition of E and by Lemma 2, thus **A**<sup>x</sup>**Π** is nilpotent and its trace is 0 by Lemma 1. For the remaining terms, let j1, ··· , j<sup>k</sup> ∈ {1, 2} not all equal. Let l ∈ [k] be an index such that j<sup>l</sup> = 2 and jl+1 = 1 where l +1= l + 1 if l<k and 1 otherwise. Using the invariance of the trace under cyclic permutations of a matrix product, we obtain

$$\begin{aligned} \text{Tr}(\mathbf{A}^{x\_1}\mathbf{P}\_{j\_1}\mathbf{A}^{x\_2}\mathbf{P}\_{j\_2}\cdots\mathbf{A}^{x\_k}\mathbf{P}\_{j\_k}) &= \text{Tr}(\mathbf{A}^{x\_l}\mathbf{P}\_{j\_l}\mathbf{A}^{x\_{l+1}}\mathbf{P}\_{j\_{\overline{l+1}}}\cdots) \\ &= \text{Tr}(\mathbf{A}^{x\_l}(\mathbf{I}-\Pi)\mathbf{A}^{x\_{\overline{l+1}}}\Pi\cdots) = 0 \end{aligned}$$

where we used Eq. 2 again for the last equality. To conclude, we have shown that Tr(**A**x) = Tr(**A**<sup>ˆ</sup> <sup>x</sup>) for all <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>∗, hence <sup>A</sup> and <sup>A</sup><sup>ˆ</sup> compute the same function on circular strings.

Moreover, we now show that the subspace E from the previous theorem can be used to obtain a characterization of the minimality of a GWM<sup>c</sup>.

**Theorem 6.** *Let* <sup>A</sup> *be a GWM*<sup>c</sup> *given by the set of matrices* {**A**σ}σ∈<sup>Σ</sup> <sup>⊂</sup> <sup>M</sup>d(F)*. Then,* <sup>A</sup> *is minimal if and only if the linear space*

$$E = \bigcap\_{x \in \Sigma^d} \ker(\mathbf{A}^x) = \{ \mathbf{v} \in \mathbb{F}^d : \mathbf{A}^x \mathbf{v} = \mathbf{0} \text{ for all } x \in \Sigma^d \}$$

*is trivial, i.e.* E = {**0**}*.*

*Proof.* Suppose that E is not trivial and let **Π** be the matrix of the orthogonal projection onto E. Then, the rank R of **I** − **Π** is strictly less than d and there exists an orthogonal matrix **<sup>U</sup>** <sup>∈</sup> <sup>R</sup><sup>d</sup>×<sup>R</sup> such that **<sup>I</sup>** <sup>−</sup> **<sup>Π</sup>** <sup>=</sup> **UU**. It follows from the previous proposition that, for any non-empty word x = x<sup>1</sup> ··· xk, we have

$$\begin{split} \text{Tr}(\mathbf{A}^{x}) &= \text{Tr}(\mathbf{A}^{x\_{1}}(\mathbf{I}-\Pi)\mathbf{A}^{x\_{2}}(\mathbf{I}-\Pi)\cdots\mathbf{A}^{x\_{k}}(\mathbf{I}-\Pi)) \\ &= \text{Tr}(\mathbf{A}^{x\_{1}}\mathbf{U}\mathbf{U}^{\top}\mathbf{A}^{x\_{2}}\mathbf{U}\mathbf{U}^{\top}\cdots\mathbf{A}^{x\_{k}}\mathbf{U}\mathbf{U}^{\top}) = \text{Tr}((\mathbf{U}^{\top}\mathbf{A}^{x\_{1}}\mathbf{U})(\mathbf{U}^{\top}\mathbf{A}^{x\_{2}}\mathbf{U})\cdots(\mathbf{U}^{\top}\mathbf{A}^{x\_{k}}\mathbf{U})) . \end{split}$$

Hence, the R-dimensional GWM<sup>c</sup> given by the matrices **A**ˆ <sup>σ</sup> = **UA**<sup>σ</sup>**U** computes the same function as A, showing that A is not minimal.

Suppose now that A is not minimal. Let B be a GWM<sup>c</sup> of dimension d < d, given by the matrices {**B**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>, such that <sup>f</sup><sup>B</sup> <sup>=</sup> <sup>f</sup>A. Let <sup>A</sup> (resp. <sup>B</sup>) be the algebra generated by the matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> (resp. {**B**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>). By Corollary 1, we can assume that both A and B are semi-simple GWM<sup>c</sup>s, i.e. that the algebras <sup>A</sup> and <sup>B</sup> are semi-simple. For each <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, let **<sup>B</sup>**<sup>ˆ</sup> <sup>σ</sup> <sup>=</sup> **<sup>B</sup>**<sup>σ</sup> <sup>⊕</sup> **<sup>0</sup>** <sup>∈</sup> <sup>R</sup><sup>d</sup>×<sup>d</sup> be the block diagonal matrix having **<sup>B</sup>**<sup>σ</sup> in the upper diagonal block and 0's elsewhere. Let <sup>B</sup><sup>ˆ</sup> be the algebras generated by the matrices {**B**<sup>ˆ</sup> <sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ Md(F). It is easy to check that the GWM<sup>c</sup> Bˆ computes the same function as A and B and that the algebra <sup>B</sup><sup>ˆ</sup> is semi-simple (it is indeed isomorphic to the semi-simple algebra <sup>B</sup>). It then follows from Corollary <sup>2</sup> that there exists an invertible matrix **<sup>P</sup>** ∈ Md(F) such that **<sup>A</sup>**<sup>σ</sup> <sup>=</sup> **PB**<sup>ˆ</sup> <sup>σ</sup>**P**−<sup>1</sup> for all <sup>σ</sup> <sup>∈</sup> Σ. Let **<sup>e</sup>**<sup>d</sup> be the <sup>d</sup>th vector of the canonical basis of <sup>F</sup><sup>d</sup>, by definition of **<sup>B</sup>**<sup>ˆ</sup> <sup>σ</sup> we have **<sup>B</sup>**<sup>ˆ</sup> <sup>σ</sup>**e**<sup>d</sup> <sup>=</sup> **<sup>0</sup>** for any <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, and consequently **<sup>A</sup>**<sup>σ</sup>**Pe**<sup>d</sup> <sup>=</sup> **<sup>0</sup>** for any symbol <sup>σ</sup>, showing that **Pe**<sup>d</sup> <sup>∈</sup> <sup>E</sup> and <sup>E</sup> <sup>=</sup> {**0**}.

It follows from the two previous theorems that by restricting the linear operators **A**<sup>σ</sup> of a GWM<sup>c</sup> A to the subspace E⊥, one can obtain a minimal GWM<sup>c</sup> computing fA. We formally state this result in the following corollary.

**Corollary 3.** *Let* <sup>A</sup> *be a GWM*<sup>c</sup> *given by the matrices* {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ Md(F) *and let* **Π** *be the matrix of the orthogonal projection onto the space* E = <sup>x</sup>∈Σ*<sup>d</sup>* ker(**A**<sup>x</sup>)*. For any orthogonal matrix* **<sup>U</sup>** <sup>∈</sup> <sup>F</sup><sup>d</sup>×<sup>R</sup> *such that* **<sup>I</sup>** <sup>−</sup> **<sup>Π</sup>** <sup>=</sup> **UU** *(where* R *is the dimension of* E⊥*), the* R*-dimensional GWM*<sup>c</sup> Aˆ *given by the matrices* **A**ˆ <sup>σ</sup> = **UA**<sup>σ</sup>**U** *is a minimal GWM*<sup>c</sup> *computing* fA*.*

*Proof.* Using the invariance of the trace under cyclic permutations of a matrix product, it directly follows from Theorem 5 that fA<sup>ˆ</sup> = fA. Moreover, one can check that Eˆ = <sup>x</sup>∈Σ*<sup>d</sup>* ker(**A**<sup>ˆ</sup> <sup>x</sup>) = {**0**} by construction of the matrices **<sup>A</sup>**<sup>ˆ</sup> <sup>σ</sup>, hence <sup>A</sup><sup>ˆ</sup> is minimal by Theorem 6.

We showed that a GWM<sup>c</sup> can be minimized by restricting its matrices to the subspace E⊥. In order to do so, one needs to compute a basis of E = <sup>x</sup>∈Σ*<sup>d</sup>* ker(**A**x). This can naively be done by first computing ker(**A**x) for each <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>d</sup> and then computing a basis for the intersection of these linear subspaces, however the complexity of this approach is exponential in the dimension d. We show in the following proposition that for semi-simple GWM<sup>c</sup>s, one simply needs to compute a basis of the space <sup>σ</sup>∈<sup>Σ</sup> ker(**A**<sup>σ</sup>), which can be done in polynomial time (provided that the field F admits efficient symbolic arithmetic, e.g. F = Q).

**Proposition 4.** *Let* A⊂Md(F) *be the finite dimensional algebra generated by the set of matrices* {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup>*. Then if* <sup>A</sup> *is semi-simple we have*

$$\bigcap\_{x \in \Sigma^d} \ker(\mathbf{A}^x) = \bigcap\_{\sigma \in \Sigma} \ker(\mathbf{A}^\sigma).$$

*Proof.* For any integer i ≥ 1, let E<sup>i</sup> = <sup>x</sup>∈Σ*<sup>i</sup>* ker(**A**<sup>x</sup>). Recall from the proof of Lemma 2 that E<sup>i</sup> ⊂ Ei+1 for all i and that E<sup>i</sup> = Ei+1 implies E<sup>i</sup> = Ei+<sup>k</sup> for any integer k ≥ 0, hence it will be sufficient to show that E<sup>1</sup> = E2. One can check that each E<sup>i</sup> is A-invariant, i.e. each E<sup>i</sup> is an A-module. Since A is semi-simple, any A-module is semi-simple [17, Theorem 2.6.2], which implies that if M is an A-module, every submodule U of M has a complement [17, Proposition 2.2.1], i.e. there exists an A-module V such that M = U ⊕ V . Now since E<sup>1</sup> is a submodule of the A-module E2, E<sup>1</sup> has a complement U in E2, i.e. U is A-invariant and E<sup>2</sup> = E<sup>1</sup> ⊕ U. Let **v** ∈ U. We show **v** = **0**. Since **<sup>v</sup>** <sup>∈</sup> <sup>E</sup>2, we have **<sup>A</sup>**<sup>σ</sup>1**A**<sup>σ</sup><sup>2</sup> **<sup>v</sup>** <sup>=</sup> **<sup>0</sup>** for all <sup>σ</sup>1, σ<sup>2</sup> <sup>∈</sup> <sup>Σ</sup>, hence **<sup>A</sup>**<sup>σ</sup>**<sup>v</sup>** <sup>∈</sup> <sup>E</sup><sup>1</sup> for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>. Moreover, we have **<sup>A</sup>**<sup>σ</sup>**<sup>v</sup>** <sup>∈</sup> <sup>U</sup> for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> since <sup>U</sup> is <sup>A</sup>-invariant. It follows that **<sup>A</sup>**<sup>σ</sup>**<sup>v</sup>** <sup>∈</sup> <sup>E</sup><sup>1</sup> <sup>∩</sup><sup>U</sup> <sup>=</sup> {**0**} and **<sup>A</sup>**<sup>σ</sup>**<sup>v</sup>** <sup>=</sup> **<sup>0</sup>** for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, hence **<sup>v</sup>** <sup>∈</sup> <sup>E</sup><sup>1</sup> and since **<sup>v</sup>** <sup>∈</sup> <sup>U</sup> we have **v** = **0**. To conclude, we have U = {**0**}, hence E<sup>1</sup> = E2.

Since a GWM<sup>c</sup> can be transformed into an equivalent semi-simple GWM<sup>c</sup> in polynomial time (see Corollary 1 and the following discussion), the minimization of a GWM<sup>c</sup> defined over circular strings can be achieved in polynomial time by first converting it to a semi-simple GWM<sup>c</sup> and then applying Corollary 3 with Proposition 4. The overall minimization algorithm is summarized in Algorithm 1.

#### **Algorithm 1.** Minimization of a GWM defined over circular strings

**Input:** <sup>A</sup> <sup>d</sup>-dimensional GWM<sup>c</sup> <sup>A</sup> given by a set of matrices {**A**<sup>σ</sup>}<sup>σ</sup>∈<sup>Σ</sup> ⊂ M<sup>d</sup>(F). **Output:** A minimal GWM<sup>c</sup> <sup>A</sup><sup>ˆ</sup> computing <sup>f</sup><sup>A</sup>.


### **5 Conclusion**

We proposed polynomial time algorithms to handle both the minimization and the equivalence problems for GWMs defined over circular strings. By doing so, we unraveled fundamental notions from algebra theory that will be central to the study of GWMs. In particular, the notion of *semi-simple* GWM<sup>c</sup> was paramount to our analysis. Intuitively, semi-simplicity can be thought of as a weak form of minimality: components from the radical do not contribute to the final computation of a GWM<sup>c</sup> (semi-simplification thus corresponds to annihilating these irrelevant components from the algebra, i.e. from the GWM<sup>c</sup>'s dynamics).

The next step is of course to try to extend the results obtained in this paper to GWMs defined over more general families of graphs. One promising direction we are currently investigating relies on extending the central notion of semi-simple GWM<sup>c</sup> to GWMs defined over arbitrary families of labeled graphs: by opening any edge e in a graph G one obtains a graph G<sup>e</sup> with two *free ports* (i.e. edges having one end that is not connected to any vertex) which would be mapped by <sup>a</sup> <sup>d</sup>-dimensional GWM <sup>A</sup> to a matrix **<sup>A</sup>**<sup>G</sup>*<sup>e</sup>* ∈ Md(F) (indeed, a GWM naturally maps any graph with k free ports to a kth order tensor; see [22, Sect. 2.2.3] for more details). For circular strings, opening an edge corresponds to choosing a particular position in the circular string leading to an actual string x ∈ Σ<sup>∗</sup> which is mapped to **A**<sup>x</sup> by the GWM. For arbitrary labeled graphs, we have fA(G) = Tr(**A**<sup>G</sup>*<sup>e</sup>* ) similarly to the case of circular strings. One can then consider the algebra <sup>A</sup> generated by the matrices **<sup>A</sup>**<sup>G</sup>*<sup>e</sup>* for any graph <sup>G</sup> in some family of graphs and any edge e in G, and define a semi-simple GWM as a GWM for which this algebra A is semi-simple (note that one exactly recovers the notion of semi-simple GWM introduced here in the special case of circular strings). Hence, the fundamental results from algebra theory we leveraged in this paper should be directly relevant to the study of general GWMs. Beyond minimization, we intend to study the problem of approximate minimization (such as the ones considered in [7,23] for string and tree weighted automata) along with the closely related problem of learning GWMs defined over richer families of graphs than the one of circular strings.

**Acknowledgements.** The author acknowledges support of an IVADO postdoctoral fellowship and would like to thank the reviewers for their helpful comments as well as Philip Amortila, Fran¸cois Denis, Clara Lacroce, Prakash Panangaden and Joelle Pineau for fruitful discussions.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Games on Graphs with a Public Signal Monitoring**

Patricia Bouyer(B)

LSV, CNRS, ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France bouyer@lsv.fr

**Abstract.** We study pure Nash equilibria in games on graphs with an imperfect monitoring based on a public signal. In such games, deviations and players responsible for those deviations can be hard to detect and track. We propose a generic epistemic game abstraction, which conveniently allows to represent the knowledge of the players about these deviations, and give a characterization of Nash equilibria in terms of winning strategies in the abstraction. We then use the abstraction to develop algorithms for some payoff functions.

### **1 Introduction**

Multiplayer concurrent games over graphs allow to model rich interactions between players. Those games are played as follows. In a state, each player chooses privately and independently an action, defining globally a move (one action per player); the next state of the game is then defined as the successor (on the graph) of the current state using that move; players continue playing from that new state, and form (an infinite) play. Each player then gets a reward given by a payoff function (one function per player). In particular, objectives of the players may not be contradictory: those games are non-zero-sum games, contrary to two-player games used for controller or reactive synthesis [23,30].

The problem of distributed synthesis [25] can be formulated using multiplayer concurrent games. In this setting, there is a global objective Φ, and one particular player called Nature. The question then is whether the so-called grand coalition (all players except Nature) can enforce Φ, whatever Nature does. While the players (except Nature) cooperate (and can initially coordinate), their choice of actions (or strategy) can only depend on what they see from the play so far. When modelling distributed synthesis as concurrent games, information players receive is given via a partial observation function of the states of the game. When the players have perfect monitoring of the play, the distributed synthesis problem reduces to a standard two-player zero-sum game. Distributed synthesis is a fairly hot topic, both using the formalization via concurrent games we have already described and using the formalization via an architecture of processes [26]. The most general decidability results in the concurrent game setting are under the

This work has been supported by ERC project EQualIS (FP7-308087).

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 530–547, 2018. https://doi.org/10.1007/978-3-319-89366-2\_29

assumption of hierarchical observation [6,36] (information received by the players is ordered) or more recently under recurring common knowledge [5].

While distributed synthesis involves several players, this remains nevertheless a zero-sum question. Using solution concepts borrowed from game theory, one can go a bit further in describing the interactions between the players, and in particular in describing rational behaviours of selfish players. One of the most basic solution concepts is that of Nash equilibria [24]. A Nash equilibrium is a strategy profile where no player can improve her payoff by unilaterally changing her strategy. The outcome of a Nash equilibrium can therefore be seen as a rational behaviour of the system. While very much studied by game theoretists (e.g. over matrix games), such a concept (and variants thereof) has been only rather recently studied over games on graphs. Probably the first works in that direction are [15,17,32,33]. Several series of works have followed. To roughly give an idea of the existing results, pure Nash equilibria always exist in turnbased games for ω-regular objectives [35] but not in concurrent games; they can nevertheless be computed for large classes of objectives [9,11,35]. The problem becomes harder with mixed (that is, stochastic) Nash equilibria, for which we often cannot decide the existence [10,34].

Computing Nash equilibria requires to (i) find a good behaviour of the system; (ii) detect deviations from that behaviour, and identify deviating players (called deviators); (iii) punish them. This simple characterization of Nash equilibria is made explicit in [18]. Variants of Nash equilibria require slightly different ingredients, but they are mostly of a similar vein.

In (almost) all these works though, perfect monitoring is implicitly assumed: in all cases, players get full information on the states which are visited; a slight imperfect monitoring is assumed in some works on concurrent games (like [9]), where actions which have been selected are not made available to all the players (we speak of hidden actions). This can yield some uncertainties for detecting deviators but not on states the game can be in, which is rather limited and can actually be handled.

In this work, we integrate imperfect monitoring into the problem of deciding the existence of pure Nash equilibria and computing witnesses. We choose to model imperfect monitoring via the notion of signal, which, given a joint decision of the players together with the next state the play will be in, gives some information to the players. To take further decisions, players get information from the signals they received, and have perfect recall about the past (their own actions and the signals they received). We believe this is a meaningful framework. Let us give an example of a wireless network in which several devices try to send data: each device can modulate its transmission power, in order to maximise its bandwidth and reduce energy consumption as much as possible. However there might be a degradation of the bandwidth due to other devices, and the satisfaction of each device is measured as a compromise between energy consumption and allocated bandwidth, and is given by a quantitative payoff function.<sup>1</sup> In such a problem, it is natural to assume that a device only gets a global information about the load of the network, and not about each other device which is connected to the network. This can be expressed using imperfect monitoring via public signals.

Following [31] in the framework of repeated matrix games, we put forward a notion of *public signal* (inspired by [31]). A signal will be said public whenever it is common to all players. That is, after each move, all the players get the same information (their own action remains of course private). We will also distinguish several kinds of payoff functions, depending on whether they are publicly visible (they only depend on the public signal), or privately visible (they depend on the public signal and on private actions: the corresponding player knows his payoff!), or invisible (players may not even be sure of their payoff).

The payoff functions we will focus on in this paper are Boolean ω-regular payoff functions and mean payoff functions. Some of the decidability results can be extended in various directions, which we will mention along the way.

As initial contributions of the paper, we show some undecidability results, and in particular that the hypothesis of public signal solely is not sufficient to enjoy all nice decidability results: for mean payoff functions, which are privately visible, one cannot decide the constrained existence of a Nash equilibrium. Constrained existence of a Nash equilibrium asks for the existence of a Nash equilibrium whose payoff satisfies some given constraint.

The main contribution of the paper is the construction of a so-called *epistemic game abstraction*. This abstraction is a two-player turn-based game in which we show that winning strategies of one of the players (Eve) actually correspond to Nash equilibria in the original game. The winning condition for Eve is rather complex, but can be simplified in the case of publicly visible payoff functions. The epistemic game abstraction is inspired by both the epistemic unfolding of [4] used for distributed synthesis, and the suspect game abstraction of [9] used to compute Nash equilibria in concurrent games with hidden actions. In our abstraction, we nevertheless not fully formalize epistemic unfoldings, and concentrate on the structure of the knowledge which is useful under the assumption of public signals; we show that several subset constructions (as done initially in [27], and since then used in various occasions, see e.g. [14,19,20,22]) made in parallel, are sufficient to represent the knowledge of all the players. The framework of [9] happens to be a special case of the public signal monitoring framework of the current paper. This construction can therefore be seen as an extension of the suspect game abstraction.

This generic construction can be applied to several frameworks with publicly visible payoff functions. We give two such applications, one with Boolean ωregular payoff functions and one with mean payoff functions.

<sup>1</sup> This can be expressed by payoffplayer <sup>i</sup> <sup>=</sup> <sup>R</sup> power*<sup>i</sup>* - <sup>1</sup> <sup>−</sup> <sup>e</sup>−0.5γ*<sup>i</sup>* L where γ<sup>i</sup> is the signalto-interference-and-noise ratio for player i, R is the rate at which the wireless system transmits the information and L is the size of the packets [29].

*Further Related Works.* We have already discussed several kinds of related works. Let us give some final remarks on related works.

We have mentioned earlier that one of the problems for computing Nash equilibria is to detect deviations and players who deviated. Somehow, the epistemic game abstraction tracks the potential deviators, and even though players might not know who exactly is responsible for the deviation (there might be several suspects), they can try to punish all potential suspects. And that what we do here. Very recently, [7] discusses the detection of deviators, and give some conditions for them to become common knowledge of the other players. In our framework, even though deviators may not become fully common knowledge, we can design mechanisms to punish the relevant ones.

Recently imperfect information has also been introduced in the setting of multi-agent temporal logics [2,3,20,21], and the main decidability results assume hierarchical information. However, while those logics allow to express rich interactions, it can somehow only consider qualitative properties. Furthermore, no tight complexity bounds are provided.

In [11], a deviator game abstraction is proposed. It twists the suspect game abstraction [9] to allow for more general solution concepts (so-called robust equilibria), but it assumes visibility of actions (hence remove any kind of uncertainties). Relying on results of [13], this deviator game abstraction allows to compute equilibria with mean payoff functions. Our algorithms for mean payoff functions will also rely on the polyhedron problem of [13].

A full version of this paper will all proofs is available as [8]. In this extended abstract, we made the choice to focus on the construction of the epistemic game abstraction and to be more sketchy on algorithms to compute Nash equilibria. We indeed believe the structure of the knowledge represented by the abstraction is the most important contribution, and that algorithms are more standard. However we believe it is important to be able to apply the abstract construction for algorithmics purpose.

### **2 Definitions**

Throughout the paper, if <sup>S</sup> <sup>⊆</sup> <sup>R</sup>, we write <sup>S</sup> for <sup>S</sup> ∪ {−∞, <sup>+</sup>∞}.

#### **2.1 Concurrent Multiplayer Games with Signals**

We consider the model of concurrent multi-player games, based on the two-player model of [1]. This model of games was used for instance in [9]. We equip games with *signals*, which will give information to the players.

**Definition 1.** *A* concurrent game with signals *is a tuple*

$$\mathcal{G} = \langle V, v\_{\text{init}}, \mathcal{P}, \mathsf{Act}, \Sigma, \mathsf{Allow}, \mathsf{Tab}, (\ell\_A)\_{A \in \mathcal{P}}, (\mathsf{p\omega} \mathsf{offf}\_A)\_{A \in \mathcal{P}} \rangle$$

*where* V *is a finite set of vertices,* vinit ∈ V *is the initial vertex, P is a finite set of players,* Act *is a finite set of actions,* <sup>Σ</sup> *is a finite alphabet,* Allow: <sup>V</sup> <sup>×</sup> *<sup>P</sup>* <sup>→</sup> <sup>2</sup>Act \ {∅} *is a mapping indicating the actions available to a given player in a given vertex,* Tab: <sup>V</sup> <sup>×</sup> Act*<sup>P</sup>* <sup>→</sup> <sup>V</sup> *associates, with a given vertex and a given action tuple the target vertex, for every* A ∈ *P,* <sup>A</sup> : - Act*<sup>P</sup>* <sup>×</sup> <sup>V</sup> → Σ *is a signal, and* payoff<sup>A</sup> : <sup>V</sup> · - Act*<sup>P</sup>* · <sup>V</sup> <sup>ω</sup> <sup>→</sup> <sup>D</sup> *is a payoff function with values in a domain* <sup>D</sup> <sup>⊆</sup> <sup>R</sup>*. We say that the game has* public signal *if there is* : - Act*<sup>P</sup>* <sup>×</sup> <sup>V</sup> → Σ *such that for every* A ∈ *P,* <sup>A</sup> = *.*

The signals will help the players monitor the game: for taking decisions, a player will have the information given by her signal and the action she played earlier. A public signal will be a common information given to all the players. Our notion of public signal is inspired by [31] and encompasses the model of [9] where only action names were hidden to the players. Note that monitoring by public signal does not mean that all the players have the same information: they have private information implied by their own actions.

An element of Act*<sup>P</sup>* is called a move. When an explicit order is given on the set of players *P* = {A1,...,A|*P*|}, we will write a move m = (mA)<sup>A</sup>∈*<sup>P</sup>* as m<sup>A</sup><sup>1</sup> ,...,m<sup>A</sup>*|P<sup>|</sup>* . If <sup>m</sup> <sup>∈</sup> Act*<sup>P</sup>* and <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, we write <sup>m</sup>(A) for the <sup>A</sup>-component of m and m(−A) for all but the A components of m. In particular, we write m(−A) = m (−A) whenever m(B) = m (B) for every B ∈ *P* \ {A}.

A *full history* h in G is a finite sequence

$$v\_0 \cdot m\_0 \cdot v\_1 \cdot m\_1 \dots m\_{k-1} \cdot v\_k \in V \cdot \left(\mathsf{Act}^{\mathcal{P}} \cdot V\right)^\*$$

such that for every 0 <sup>≤</sup> i<k, <sup>m</sup><sup>i</sup> <sup>∈</sup> Allow(vi) and <sup>v</sup>i+1 <sup>=</sup> Tab(vi, mi). For readability we will also write h as v<sup>0</sup> <sup>m</sup><sup>0</sup> −−→ <sup>v</sup><sup>1</sup> <sup>m</sup><sup>1</sup> −−→ ... <sup>m</sup>*k−*<sup>1</sup> −−−→ vk.

We write *last*(h) for the last vertex of h (i.e., vk). If i ≤ k, we also write h≤<sup>i</sup> for the prefix <sup>v</sup><sup>0</sup> · <sup>m</sup><sup>0</sup> · <sup>v</sup><sup>1</sup> · <sup>m</sup><sup>1</sup> ...m<sup>i</sup>−<sup>1</sup> · <sup>v</sup>i. We write Hist<sup>G</sup>(v0) (or simply Hist(v0) if G is clear in the context) for the set of full histories in G that start at v0.

Let A ∈ *P*. The projection of h for A is denoted πA(h) and is defined as:

$$\left(v\_0 \cdot \left(m\_0(A), \ell\_A(m\_0, v\_1)\right) \dots \left(m\_{k-1}(A), \ell\_A(m\_{k-1}, v\_k)\right) \in V \cdot \left(\mathsf{Act} \times \Sigma\right)^\*\right)$$

This will be the information available to player A: it contains both the actions she played so far and the signal she received. Note that we assume perfect recall, that is, while playing, A will remember all her past knowledge, that is, all of πA(h) if h has been played so far. We define the *undistinguishability relation* ∼<sup>A</sup> as the equivalence relation over full histories induced by πA: for two histories h and h , h ∼<sup>A</sup> h iff πA(h) = πA(h ). While playing, if h ∼<sup>A</sup> h , A will not be able to know whether h or h has been played. We also define the A-label of h as A(h) = A(m0, v1) · A(m1, v2)...A(m<sup>k</sup>−<sup>1</sup>, vk).

We extend all the above notions to infinite sequences in a straightforward way and to the notion of *full play*. We write Plays<sup>G</sup>(v0) (or simply Plays(v0) if <sup>G</sup> is clear in the context) for the set of full plays in G that start at v0.

We will say that the game G has *publicly (resp. privately) visible payoffs* if for every <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, for every <sup>v</sup><sup>0</sup> <sup>∈</sup> <sup>V</sup> , for every ρ, ρ <sup>∈</sup> Plays(v0), A(ρ) = A(ρ ) (resp. ρ ∼<sup>A</sup> ρ ) implies payoff<sup>A</sup>(ρ) = payoff<sup>A</sup>(ρ ). Otherwise they are said *invisible*. Private visibility of payoffs, while not always assumed (see for instance [3,19]), are reasonable assumptions: using only her knowledge, a player knows her payoff. Public visibility is more restrictive, but will be required for some of the results.

Let A ∈ *P* be a player. A *strategy* for player A from v<sup>0</sup> is a mapping <sup>σ</sup><sup>A</sup> : Hist(v0) <sup>→</sup> Act such that for every history <sup>h</sup> <sup>∈</sup> Hist(v0), <sup>σ</sup>(h) <sup>∈</sup> Allow(*last*(h)). It is said A*-compatible* whenever furthermore, for all histories h, h <sup>∈</sup> Hist(v0), <sup>h</sup> <sup>∼</sup><sup>A</sup> <sup>h</sup> implies <sup>σ</sup>A(h) = <sup>σ</sup>A(h ). An *outcome* of σ<sup>A</sup> is a(n infinite) play ρ = v<sup>0</sup> · m<sup>0</sup> · v<sup>1</sup> · m<sup>1</sup> ... such that for every i ≥ 0, σA(ρ≤i) = mi(A). We write out(σA, v0) for the set of outcomes of <sup>σ</sup><sup>A</sup> from <sup>v</sup>0.

A *strategy profile* is a tuple σ*<sup>P</sup>* = (σA)<sup>A</sup>∈*<sup>P</sup>* , where, for every player A ∈ *P*, σ<sup>A</sup> is a strategy for player A. The strategy profile is said *info-compatible* whenever each <sup>σ</sup><sup>A</sup> is A-compatible. We write out(σ*<sup>P</sup>* , v0) for the unique full play from <sup>v</sup>0, which is an outcome of all strategies part of σ*<sup>P</sup>* .

When σ*<sup>P</sup>* is a strategy profile and σ <sup>A</sup> a player-A strategy, we write σ*<sup>P</sup>* [A/σ A] for the profile where A plays according to σ <sup>A</sup>, and each other player B plays according to σB. The strategy σ <sup>A</sup> is a *deviation* of player A, or an A*-deviation*.

**Definition 2.** *A* Nash equilibrium *from* v<sup>0</sup> *is an info-compatible strategy profile* σ *such that for every* A ∈ *P, for every player-*A A*-compatible strategy* σ A*,* payoff<sup>A</sup> out(σ, v0) <sup>≥</sup> payoff<sup>A</sup> out(σ[A/σ <sup>A</sup>], v0) *.*

In this definition, deviation σ <sup>A</sup> needs not be A-compatible, since the only meaningful part of σ <sup>A</sup> is along out(σ[A/σ <sup>A</sup>], v0), where there are no ∼A-equivalent histories: any deviation can be made A-compatible without affecting the profitability of the resulting outcome. Note also that there might be an A-deviation σ <sup>A</sup> which is not observable by another player <sup>B</sup> (out(σ, v0) <sup>∼</sup><sup>B</sup> out(σ[A/σ <sup>A</sup>], v0)), and there might be two deviations σ <sup>B</sup> (by player B) and σ <sup>C</sup> (by player C) that cannot be distinguished by player <sup>A</sup> (out(σ[B/σ <sup>B</sup>], v0) <sup>∼</sup><sup>A</sup> out(σ[C/σ <sup>C</sup> ], v0)). Tracking such deviations will be the core of the abstraction we will develop.

*Payoff Functions.* In the following we will consider various payoff functions. Let <sup>Φ</sup> be an <sup>ω</sup>-regular property over some alphabet <sup>Γ</sup>. The function *pay*<sup>Φ</sup> : <sup>Γ</sup> <sup>ω</sup> <sup>→</sup> {0, <sup>1</sup>} is defined by, for every **<sup>a</sup>** <sup>∈</sup> <sup>Γ</sup> <sup>ω</sup>, *pay*Φ(**a**) = 1 if and only if **<sup>a</sup>** <sup>|</sup><sup>=</sup> <sup>Φ</sup>. A publicly (resp. privately) visible payoff function payoff<sup>A</sup> for player <sup>A</sup> is said associated with <sup>Φ</sup> over <sup>Σ</sup> (resp. Act×Σ) whenever it is defined by payoff<sup>A</sup>(ρ) = *pay*Φ(A(ρ)) (resp. payoff<sup>A</sup>(ρ) = *pay*Φ(πA(ρ)−v<sup>0</sup> ), where <sup>π</sup>A(ρ)−v<sup>0</sup> crops the first <sup>v</sup>0). Such a payoff function is called a Boolean ω-regular payoff function.

Let <sup>Γ</sup> be a finite alphabet and <sup>w</sup>: <sup>Γ</sup> <sup>→</sup> <sup>Z</sup> be a weight assigning a value to every letter of that alphabet. We define two payoff functions over <sup>Γ</sup> <sup>ω</sup> by, for every **<sup>a</sup>** = (ai)<sup>i</sup>≥<sup>1</sup> <sup>∈</sup> <sup>Γ</sup> <sup>ω</sup>, *pay*MP*<sup>w</sup>* (**a**) = lim inf<sup>n</sup>→∞ <sup>1</sup> n <sup>n</sup> <sup>i</sup>=1 w(ai) and *pay*MP*<sup>w</sup>* (**a**) = lim sup<sup>n</sup>→∞ <sup>1</sup> n <sup>n</sup> <sup>i</sup>=1 w(ai). A publicly visible payoff function payoff<sup>A</sup> for player <sup>A</sup> is said associated with the liminf (resp. limsup) mean payoff of <sup>w</sup> whenener it is defined by payoff<sup>A</sup>(ρ) = *pay*MP*<sup>w</sup>* (A(ρ)) (resp. *pay*MP*<sup>w</sup>* (A(ρ))). A privately visible payoff function payoff<sup>A</sup> for player <sup>A</sup> is said associated with the liminf (resp. limsup) mean payoff of w whenener it is defined by payoff<sup>A</sup>(ρ) = *pay*MP*<sup>w</sup>* (πA(ρ)−v<sup>0</sup> ) (resp. *pay*MP*<sup>w</sup>* (πA(ρ)−v<sup>0</sup> )).

**Fig. 1.** An example of a concurrent game with public signal (yellow and green: public signal). Edges in red and bold are part of the strategy profile. Dashed edges are the possible deviations. One can notice that none of the deviations is profitable to the deviator, hence the strategy profile is a Nash equilibrium. Convention in the drawing: edges with no label are for complementary labels (for instance the edge from v<sup>5</sup> to 0, 0, 0 is labelled by all a1, a2, a<sup>3</sup> not in the set {a, a, a, b, a, a, b, a, b} (Color figure online))

*Example 1.* We now illustrate most notions on the game of Fig. 1. This is a game with three players A1, A<sup>2</sup> and A3, and which is played basically in two steps, starting at v0. Graphically an edge labelled a1, a2, a3 between two vertices v and <sup>v</sup> represents the fact that <sup>a</sup><sup>i</sup> <sup>∈</sup> Allow(v, Ai) for every <sup>i</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>} and that <sup>v</sup> <sup>=</sup> Tab(v,a1, a2, a3). As a convention, <sup>∗</sup> stands for both <sup>a</sup> and <sup>b</sup>. For readability, bottom vertices explicitly indicate the payoffs of the three players (same order as for actions) if the game ends in that vertex.

After the first step of the game, signal yellow or green is sent to all the players. Histories v<sup>0</sup> · a, b, a · v<sup>2</sup> and v<sup>0</sup> · a, a, a · v<sup>1</sup> are undistinguishable by A<sup>1</sup> and A<sup>3</sup> (same action, same signal), but they can be distinguished by A<sup>2</sup> because of different actions (even if the same signal is observed).

In bold red, we have depicted a strategy profile, which is actually a Nash equilibrium. We analyze the possible deviations in this game to argue for this.


game proceeds to v<sup>4</sup> or v<sup>5</sup> (but she knows that if A<sup>1</sup> has deviated, then we are in v4, and if A<sup>3</sup> has deviated, we are in v5). Then, A<sup>2</sup> has to find a way to punish both players, to be safe. On the other hand, both players A<sup>1</sup> and A<sup>3</sup> precisely know what has happened: in case she didn't deviate herself, she knows the other one deviated! And she knows in which state the game is in. Hence in state v4, A<sup>3</sup> can help player A<sup>2</sup> punishing A1, whereas in state v5, A<sup>1</sup> can help player A<sup>2</sup> punishing A3. Examples of punishing moves are therefore those depicted in red and bold; and they are part of the global strategy profile. Note that the action of A<sup>2</sup> out of v<sup>5</sup> has to be the same as the one out of v4: this is required given the imperfect knowledge of A2. On the other hand, the action of A<sup>3</sup> can be different out of v<sup>4</sup> and out of v<sup>5</sup> (which is the case in the given example profile).

*Two-Player Turn-Based Game Structures.* They are specific cases of the previous model, where at each vertex, at most one player has more than one action in her set of allowed actions. But for convenience, we will give a simplified definition, with only objects that will be useful. A two-player turn-based game structure is a tuple <sup>G</sup> <sup>=</sup> S, SEve, SAdam, sinit, A,Allow,Tab, where <sup>S</sup> <sup>=</sup> <sup>S</sup>Eve <sup>S</sup>Adam is a finite set of states (states in SEve belong to player Eve whereas states in SAdam belong to player Adam), <sup>s</sup>init <sup>∈</sup> <sup>S</sup> is the initial state, <sup>A</sup> is a finite alphabet, Allow: <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> \ {∅} gives the set of available actions, and Tab: <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>S</sup> is the next-state function. If <sup>s</sup> <sup>∈</sup> <sup>S</sup>Eve (resp. <sup>S</sup>Adam), Allow(s) is the set of actions allowed to Eve (resp. Adam) in state s.

In this context, strategies will see sequences of states and actions, with full information. Note that we do not include any winning condition or payoff function in the tuple, hence the name structure.

#### **2.2 The Problem**

We are interested in the constrained existence of a Nash equilibrium. For simplicity, we define constraints using non-strict thresholds constraints, but could well impose more involved constraints.

*Problem 1 (Constrained existence problem).* Given a game with signals <sup>G</sup> <sup>=</sup> V,vinit, *<sup>P</sup>*,Act,Σ,Allow,Tab,(A)<sup>A</sup>∈*<sup>P</sup>* ,(payoff<sup>A</sup>)<sup>A</sup>∈*<sup>P</sup>* and threshold vectors (νA)<sup>A</sup>∈*<sup>P</sup>* , (ν <sup>A</sup>)<sup>A</sup>∈*<sup>P</sup>* <sup>∈</sup> <sup>Q</sup>*<sup>P</sup>* , can we decide whether there exists a Nash equilibrium <sup>σ</sup>*<sup>P</sup>* from <sup>v</sup>init such that for every <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, <sup>ν</sup><sup>A</sup> <sup>≤</sup> payoff<sup>A</sup>(out(σ*<sup>P</sup>* , vinit)) <sup>≤</sup> <sup>ν</sup> <sup>A</sup>? If so, compute one. If the constraints on the payoff are trivial (that is, ν<sup>A</sup> = −∞ and ν <sup>A</sup> = +∞ for every A ∈ *P*), we simply speak of the existence problem.

#### **2.3 First Undecidability Results**

In this section we state two preliminary undecidability results.

**Theorem 1.** *– The existence problem in games with signals is undecidable with three players and publicly visible Boolean* ω*-regular payoff functions.*

*– The constrained existence problem in games with a public signal is undecidable with two players and privately visible mean payoff functions.*

Proofs of these results rely on the distributed synthesis problem [26] for the first one, and on blind two-player mean-payoff games [19] for the second one. While there is no real surprise in the first result since we know that arbitrary partial information yields intrinsic difficulties, the second one suggests restrictions both to public signals and to publicly visible payoff functions.

In the following we will focus on public signals and develop an epistemic game abstraction, which will record and track possible deviations in the game. This will then be applied to get decidability results in two frameworks assuming publicly visible payoff functions.

### **3 The Epistemic Game Abstraction**

Building over [4,9], we construct an epistemic game, which will record possible behaviours of the system, together with possible unilateral deviations. In [4], notions of epistemic Kripke structures are used to really track the precise knowledge of the players. These are mostly useful since undistinguishable states (expressed using signals here) are assumed arbitrary (no hierarchical structure). We could do the same here, but we think that would be overly complex and hide the real structure of knowledge in the framework of public signals. We therefore prefer to stick to simpler subset constructions, which are more commonly used (see e.g. [27] or later [14,19,22]), though it has to be a bit more involved here since also deviations have to be tracked.

Let <sup>G</sup> <sup>=</sup> V,vinit, *<sup>P</sup>*,Act,Σ,Allow,Tab, ,(payoff<sup>A</sup>)<sup>A</sup>∈*<sup>P</sup>* be a concurrent game with public signal. We will first define the epistemic abstraction as a two-player game structure E<sup>G</sup> = SEve, SAdam, sinit, Σ ,Allow ,Tab , and then state the correspondence between G and EG. The epistemic abstraction will later be used for decidability and algorithmics purposes. For clarity, we use the terminology "vertices" in G and "states" (or "epistemic states") in EG.

### **3.1 Construction of the Game Structure** *<sup>E</sup><sup>G</sup>*

The game <sup>E</sup><sup>G</sup> will be played between two players, Eve and Adam. The aim of Eve is to build a suitable Nash equilibrium, whereas the aim of Adam is to prove that it is not an equilibrium; in particular, Adam will try to find a profitable deviation (to disprove the claim of Eve that she is building a Nash equilibrium). Choices available to Eve and Adam in the abstract game have to reflect partial knowledge of the players in the original game G. States in the abstract game will therefore store information, which will be sufficient to infer the undistinguishability relation of all the players in the original game. Thanks to the public signal assumption, this information will be simple enough to have a simple structure.

In the following, we set *P* <sup>⊥</sup> = *P* ∪ {⊥}, where ⊥ is a fresh symbol. For convenience, if <sup>m</sup> <sup>∈</sup> Act*<sup>P</sup>* , we extend the notation <sup>m</sup>(−A) when <sup>A</sup> <sup>∈</sup> *<sup>P</sup>* to *<sup>P</sup>* <sup>⊥</sup> by setting m(−⊥) = m. We now describe all the components of EG.

A state of Eve will store a set of vertices of the original game one can be in, together with possible deviators. More precisely, states of Eve are defined as <sup>S</sup>Eve <sup>=</sup> {s: *<sup>P</sup>* <sup>⊥</sup> <sup>→</sup> <sup>2</sup><sup>V</sup> | |s(⊥)| ≤ <sup>1</sup>}. Let <sup>s</sup> <sup>∈</sup> <sup>S</sup>Eve. If <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, vertices of <sup>s</sup>(A) are those where the game can be in, assuming one has followed the suggestions of Eve so far, up to an <sup>A</sup>-deviation; on the other hand, if <sup>s</sup>(⊥) <sup>=</sup> <sup>∅</sup>, the single vertex v ∈ s(⊥) is the one the game is in, assuming one has followed all suggestions by Eve so far (in particular, if Eve is building a Nash equilibrium, then this vertex belongs to the main outcome of the equilibrium). We define sit(s) = {(v, A) <sup>∈</sup> V × *P* <sup>⊥</sup> | v ∈ s(A)} for the set of *situations* the game can be in at s:


Structure of state s will allow to infer the undistinguishability relation of all the players in game G: basically (and we will formalize this later), if she is not responsible for a deviation, player A ∈ *P* will not know in which of the situations of sit(s) \ <sup>V</sup> × {A} the game has proceeded; if she is responsible for a deviation, player A will know exactly in which vertex v ∈ s(A) the game has proceeded.

Let <sup>s</sup> <sup>∈</sup> <sup>S</sup>Eve. From state <sup>s</sup>, Eve will suggest a tuple of moves <sup>M</sup>, one for each possible situation (v, A) <sup>∈</sup> sit(s). This tuple of moves has to satisfy the undistinguishability relation: if a player does not distinguish between two situations, her action should be the same in these two situations:

$$\mathsf{Allow}'(s) = \left\{ M \in \prod\_{\substack{(v,A) \in \mathsf{sit}(s) \\ \forall A \in \mathcal{P}}} \mathsf{Allow}(v) \mid \forall (v\_B, B), (v\_C, C) \in \mathsf{sit}(s), \\ \forall A \in \mathcal{P} \mid \{B, C\}, \ M(v\_B, B)(A) = M(v\_C, C)(A) \right\}$$

In the above set, the constraint M(vB, B)(A) = M(v<sup>C</sup> , C)(A) expresses the fact that player A should play the same action in the two situations (vB, B) and (v<sup>C</sup> , C), since she does not distinguish between them. Obviously, we assume Σ contains all elements of Allow (s) above.

States of Adam are then copies of states of Eve with suggestions given by Eve, that is: <sup>S</sup>Adam <sup>=</sup> {(s, M) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>Eve <sup>×</sup> Allow (s)}. And naturally, we define Tab (s, M)=(s, M) if <sup>M</sup> <sup>∈</sup> Allow (s).

Let (s, M) <sup>∈</sup> <sup>S</sup>Adam. From state (s, M), Adam will choose a signal value which can be activated from some situation allowed in s, after no deviation or a singleplayer deviation w.r.t. <sup>M</sup>. From a situation (v, A) <sup>∈</sup> sit(s) with <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, only A-deviations can be allowed (since we look for unilateral deviations), hence any signal activated by an A-deviation (w.r.t. M(v, A)) from v should be allowed. From the situation (v, <sup>⊥</sup>) <sup>∈</sup> sit(s) (if there is one), one can continue without any deviation, or any kind of single-player deviation should be allowed, hence the signal activated by M(v, ⊥) from v should be allowed, and any signal activated by some A-deviation (w.r.t. M(v, ⊥)) from v should be allowed as well. Formally:

$$\begin{split} \mathsf{Allos}'(s,M) &= \left\{ \beta \in \Sigma \, \middle| \begin{array}{l} \exists A \in \mathcal{P} \\ \exists v \in s(A) \quad \text{s.t.} \\ \exists m \in \mathsf{Act}^{\mathcal{P}} \end{array} \begin{array}{l} \mbox{(i)} \ m(-A) = M(v,A)(-A) \\ \mbox{(ii)} \ \ell(m, \mathsf{Tab}(v,m)) = \beta \end{array} \right\} \\ &\cup \left\{ \beta \in \Sigma \, \middle| \begin{array}{l} \exists v \in s(\bot) \\ \exists m \in \mathsf{Act}^{\mathcal{P}} \text{ s.t.} \\ \exists A \in \mathcal{P} \end{array} \begin{array}{l} \mbox{(i)} \ m(-A) = M(v,\bot)(-A) \\ \mbox{(ii)} \ \ell(m, \mathsf{Tab}(v,m)) = \beta \end{array} \right\} \end{split}$$

Note that we implicitly assume that Σ contains Σ.

It remains to explain how one can compute the next state of some (s, M) ∈ <sup>S</sup>Adam after some signal value <sup>β</sup> <sup>∈</sup> Allow (s, M). The new state has to represent the new knowledge of the players in the original game when they have seen signal β; this has to take into account all possible deviations that we have already discussed which activate the signal value β. The new state is the result of several simultaneous subset constructions, which we formalize as follows: <sup>s</sup> <sup>=</sup> Tab ((s, M), β), where for every A ∈ *P* <sup>⊥</sup>, v ∈ s (A) if and only if there is <sup>m</sup> <sup>∈</sup> Act*<sup>P</sup>* such that <sup>β</sup> <sup>=</sup> (m, v ), and

1. either there is <sup>v</sup> <sup>∈</sup> <sup>s</sup>(A) such that <sup>m</sup>(−A) = <sup>M</sup>(v, A)(−A) and <sup>v</sup> <sup>=</sup> Tab(v,m); 2. or there is <sup>v</sup> <sup>∈</sup> <sup>s</sup>(⊥) such that <sup>m</sup>(−A) = <sup>M</sup>(v, <sup>⊥</sup>)(−A) and <sup>v</sup> <sup>=</sup> Tab(v,m).

Note that in case A = ⊥, the two above cases are redundant.

Before stating properties of EG, we illustrate the construction.

*Example 2.* We consider again the example of Fig. 1, and we assume that the public signal when reaching the leaves of the game is uniformly orange. We depict (part of) the epistemic game abstraction of the game on Fig. 2. One can notice that from Eve-states s<sup>1</sup> and s2, moves are multi-dimensional, in the sense that there is one move per vertex appearing in the state. There are nevertheless compatibility conditions which should be satisfied (expressed in condition Allow ); for instance, from s2, player A<sup>2</sup> does not distinguish between the two options (i) A<sup>1</sup> has deviated and the game is in v4, and (ii) A<sup>3</sup> has deviated and the game is in v5, hence the action of player A<sup>2</sup> should be the same in the two moves (a in the depicted example, written in red).

#### **3.2 Interpretation of this Abstraction**

While we gave an intuitive meaning to the (epistemic) states of EG, we now need to formalize this. And to do that, we need to explain how full histories and plays in E<sup>G</sup> can be interpreted as full histories and plays in G.

Let <sup>v</sup><sup>0</sup> <sup>∈</sup> <sup>V</sup> , and define <sup>s</sup><sup>0</sup> : *<sup>P</sup>* <sup>⊥</sup> <sup>→</sup> <sup>2</sup><sup>V</sup> <sup>∈</sup> <sup>S</sup>Eve such that <sup>s</sup>0(⊥) = {v0} and <sup>s</sup>0(A) = <sup>∅</sup> for every <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*. In the following, when <sup>M</sup> <sup>∈</sup> Allow (s) for some s ∈ SEve, if we speak of some M(v, A), we implicitly assume that (v, A) <sup>∈</sup> sit(s). Given a full history <sup>H</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>M</sup><sup>0</sup> −−→ (s0, M0) <sup>β</sup><sup>0</sup> −→ s<sup>1</sup> <sup>M</sup><sup>1</sup> −−→ (s1, M1) <sup>β</sup><sup>1</sup> −→ <sup>s</sup><sup>2</sup> ...(s<sup>k</sup>−<sup>1</sup>, M<sup>k</sup>−<sup>1</sup>) <sup>β</sup>*k−*<sup>1</sup> −−−→ s<sup>k</sup> in EG, we write *concrete*(H) for the set of full histories in the original game, which correspond to H, up to a single deviation, that is: v<sup>0</sup> <sup>m</sup><sup>0</sup> −−→ <sup>v</sup><sup>1</sup> <sup>m</sup><sup>1</sup> −−→ <sup>v</sup><sup>2</sup> ...v<sup>k</sup>−<sup>1</sup> m*k−*<sup>1</sup> −−−→ v<sup>k</sup> ∈ *concrete*(H) whenever for every <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> <sup>−</sup> 1, <sup>v</sup>i+1 <sup>=</sup> Tab(vi, mi) and <sup>β</sup><sup>i</sup> <sup>=</sup> (mi, vi+1), and:

**Fig. 2.** Part of the epistemic game corresponding to the game of Fig. 1. For clarity, symbol − is for any choice a or b (the precise choice is meaningless). (Color figure online)

	- (i) for every 0 ≤ i<i0, m<sup>i</sup> = Mi(vi, ⊥);
	- (ii) m<sup>i</sup><sup>0</sup> = M<sup>i</sup><sup>0</sup> (v<sup>i</sup><sup>0</sup> , ⊥), but m<sup>i</sup><sup>0</sup> (−A) = M<sup>i</sup><sup>0</sup> (v<sup>i</sup><sup>0</sup> , ⊥)(−A);
	- (iii) for every i<sup>0</sup> < i ≤ k − 1, mi(−A) = Mi(vi, A)(−A).

Case (a) corresponds to a concrete history with no deviation (all moves suggested by Eve have been followed). Case (b) corresponds to a deviation by player A, and i<sup>0</sup> is the position at which player A has started deviating.

We write *concrete*⊥(H) for the set of histories of type (a); there is at most one such history, which is the real concrete history suggested by Eve. And we write *concrete*A(H) for the set of histories of the type (b) with deviator A. The correctness of the approach is obtained thanks to the following characterization of the undistinguishability relations along H: for every A ∈ *P*, for every h<sup>1</sup> = h<sup>2</sup> ∈ *concrete*(H),

$$h\_1 \sim\_A h\_2 \text{ iff } h\_1, h\_2 \notin concrete\_A(H).$$

In particular, a player may not distinguish between deviations by other players, or between a deviation by another player and the real concrete history suggested by Eve. But of course, in any case, a player will know that she has deviated!

We extend all these notions to full plays. A full play visiting only Eve-states s such that s(⊥) = ∅ is called a ⊥-play.

#### **3.3 Winning Condition of** Eve

A zero-sum game will be played on the game structure EG, and the winning condition of Eve will be given on the branching structure of the set of outcomes of a strategy for Eve, and not individually on each outcome, as standardly in two-player zero-sum games. We write <sup>s</sup>init for the state of Eve such that <sup>s</sup>init(⊥) = {vinit} and <sup>s</sup>init(A) = <sup>∅</sup> for every <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*. Let <sup>p</sup> = (pA)A∈*<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*<sup>P</sup>* , and σEve be a strategy for Eve in <sup>E</sup>G; it is said *winning* for <sup>p</sup> from <sup>s</sup>init whenever payoff(ρ) = <sup>p</sup>, where <sup>ρ</sup> is the unique element of *concrete*⊥(out<sup>⊥</sup>(σEve, sinit)) (where we write out<sup>⊥</sup>(σEve, sinit) for the unique outcome of <sup>σ</sup>Eve from <sup>s</sup>init which is a <sup>⊥</sup>-play), and for every <sup>R</sup> <sup>∈</sup> out(σEve, sinit), for every <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, for every <sup>ρ</sup> <sup>∈</sup> *concrete*A(R), payoff<sup>A</sup>(ρ) <sup>≤</sup> <sup>p</sup>A.

For every epistemic state s ∈ SEve, we define the set of *suspect* players susp(s) = {<sup>A</sup> <sup>∈</sup> *<sup>P</sup>* <sup>|</sup> <sup>s</sup>(A) <sup>=</sup> ∅} (this is the set of players that may have deviated). By extension, if R = s<sup>0</sup> <sup>M</sup><sup>0</sup> −−→ (s0, M0) <sup>β</sup><sup>0</sup> −→ s<sup>1</sup> ...s<sup>k</sup> <sup>M</sup>*<sup>k</sup>* −−→ (sk, Mk) <sup>β</sup>*<sup>k</sup>* −→ sk+1 ..., we define susp(R) = lim<sup>k</sup>→∞ susp(sk). Note that the sequence (susp(sk))<sup>k</sup> is non-increasing, hence it stabilizes.

Assuming public visibility of the payoff functions in G, we can define when R is a full play in <sup>E</sup>G, and <sup>A</sup> <sup>∈</sup> *<sup>P</sup>*, payoff <sup>A</sup>(R) = payoff<sup>A</sup>(ρ), where <sup>ρ</sup> <sup>∈</sup> *concrete*(R). It is easy to show that payoff <sup>A</sup> is well-defined for every A ∈ *P*. Under this assumption, the winning condition of Eve can be rewritten as: σEve is winning for p from <sup>s</sup>init whenever payoff (out<sup>⊥</sup>(σEve, sinit)) = <sup>p</sup>, and for every <sup>R</sup> <sup>∈</sup> out(σEve, sinit), for every <sup>A</sup> <sup>∈</sup> susp(R), payoff <sup>A</sup>(R) ≤ pA.

#### **3.4 Correction of the Epistemic Abstraction**

The epistemic abstraction tracks everything that is required to detect Nash equilibria in the original game, which we make explicit in the next result. Note that this theorem does not require public visibility of the payoff functions.

**Theorem 2.** *Let* <sup>G</sup> *be a concurrent game with public signal, and* <sup>p</sup> <sup>∈</sup> <sup>R</sup>*<sup>P</sup> . There is a Nash equilibrium in* <sup>G</sup> *with payoff* <sup>p</sup> *from* <sup>v</sup>init *if and only if* Eve *has a winning strategy for* p *in* E<sup>G</sup> *from* sinit*.*

The proof of this theorem highlights a correspondence between Nash equilibria in <sup>G</sup> and winning strategies of Eve in <sup>E</sup>G. In this correspondence, the main outcome of the equilibrium in G is the unique ⊥-concretisation of the unique <sup>⊥</sup>-play generated by the winning strategy of Eve.

#### **3.5 Remarks on the Construction**

We did not formalize the epistemic unfolding as it is made in [4]. We believe we do not really learn anything for public signal using it. And the above extended subset construction can much better be understood.

One could argue that this epistemic game gives more information to the players, since Eve explicitely gives to everyone the move that should be played. But in the real game, the players also have that information, which is obtained by an initial coordination of the players (this is required to achieve equilibria).

Finally, notice that the espitemic game constructed here generalizes the suspect game construction of [9], where all players have perfect information on the states of the game, but cannot see the actions that are precisely played. Somehow, games in [9] have a public signal telling the state the game is in (that is, (m, v) = v). So, in the suspect game of [9], the sole uncertainty is in the players that may have deviated, not in the set of states that are visited.

*Remark 1.* Let us analyze the size of the epistemic game abstraction. The size of the alphabet is bounded by <sup>|</sup>Σ|+|Act<sup>|</sup> <sup>|</sup>*P*|·|<sup>V</sup> |·(1+|*P*|). Furthermore, <sup>|</sup>Σ<sup>|</sup> is bounded by <sup>|</sup><sup>V</sup> |·|Act<sup>|</sup> |*P*| . The number of states is therefore in <sup>O</sup>(2|*P*|·|<sup>V</sup> <sup>|</sup> · |Act<sup>|</sup> |*P*| <sup>2</sup>·|<sup>V</sup> <sup>|</sup> ). The epistemic game is therefore of exponential size w.r.t. the initial game. Note that we could reduce the bounds by using tricks like those in [9, Proposition 4.8], but this would not avoid an exponential blowup.

#### **4 Two Applications with Publicly Visible Payoffs**

While the construction of the epistemic game has transformed the computation of Nash equilibria in a concurrent game with public signal to the computation of winning strategies in a two-player zero-sum turn-based game, we cannot apply standard algorithms out-of-the-box, because the winning condition is rather complex. In the following, we present two applications of that approach in the context of publicly visible payoffs, one with Boolean payoff functions, and another with mean payoff functions. Remember that in the latter case, public visibility is required to have decidability (Theorem 1).

The epistemic game has a specific structure, which can be used for algorithmics purpose. The main outcome of a potential Nash equilibrium is given by a ⊥-play, that is, a play visiting only epistemic states s with s(⊥) = ∅. There are now two types of deviations:


Using such an approach and results of [16] on generalized parity games, we obtain the following result for Boolean ω-regular payoff functions:

**Theorem 3.** *The constrained existence problem is in* EXPSPACE *and* EXPTIME*-hard for concurrent games with public signal and publicly visible Boolean payoff functions associated with parity conditions. The lower bound holds even for B¨uchi conditions and two players.*

The same approach could be used for the ordered objectives of [9], which are finite preference relations over sets of ω-regular properties. Also, we believe we can enrich the epistemic game construction and provide an algorithm to decide the constrained existence problem for Boolean ω-regular invisible payoff functions.

We have also investigated publicly visible mean payoff functions. While we could have used the same bottom-up approach as above and applied results from [12,13], we adopt an approach similar to that of [11], which consists in transforming the winning condition of Eve in <sup>E</sup><sup>G</sup> into a so-called *polyhedron query* in a multi-dimensional mean-payoff game. Given such a game, a polyhedron query asks whether there exists a strategy for Eve which achieves a payoff belonging to some given polyhedron. Using this approach, we get the following result:

**Theorem 4.** *The constrained existence problem is in* NPNEXPTIME *(hence in* EXPSPACE*) and* EXPTIME*-hard for concurrent games with public signal and publicly visible mean payoff functions.*

### **5 Conclusion**

In this paper, we have studied concurrent games with imperfect monitoring modelled using signals. We have given some undecidability results, even in the case of public signals, when the payoff functions are not publicly visible. We have then proposed a construction to capture single-player deviations in games with public signals, and reduced the search of Nash equilibria to the synthesis of winning strategies in a two-player turn-based games (with a rather complex winning condition though). We have applied this general framework to two classes of payoff functions, and obtained decidability results.

As further work we wish to understand better if there could be richer communication patterns which would allow representable knowledge structures for Nash equilibria and thereby the synthesis of Nash equilibria under imperfect monitoring. A source of inspiration for further work will be [28].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **WQO Dichotomy for 3-Graphs**

Slawomir Lasota(B) and Radoslaw Pi´orkowski

Institute of Informatics, University of Warsaw, Warsaw, Poland sl@mimuw.edu.pl

**Abstract.** We investigate data-enriched models, like Petri nets with data, where executability of a transition is conditioned by a relation between data values involved. Decidability status of various decision problems in such models may depend on the structure of data domain. According to the WQO Dichotomy Conjecture, if a data domain is homogeneous then it either exhibits a well quasi-order (in which case decidability follows by standard arguments), or essentially all the decision problems are undecidable for Petri nets over that data domain.

We confirm the conjecture for data domains being 3-graphs (graphs with 2-colored edges). On the technical level, this results is a significant step beyond known classification results for homogeneous structures.

### **1 Introduction**

In Petri nets with data, tokens carry values from some data domain, and executability of transitions is conditioned by a relation between data values involved. One can consider *unordered data*, like in [25], i.e., an infinite data domain with the equality as the only relation; or *ordered data*, like in [21], i.e., an infinite densely totally ordered data domain; or timed data, like in timed Petri nets [1] and timed-arc Petri nets [15]. In [19] an abstract setting of Petri nets with an arbitrary fixed data domain A has been introduced, parametric in a relational structure A. The setting uniformly subsumes unordered, ordered and timed data (represented by <sup>A</sup> = (N, =), <sup>A</sup> = (Q, <sup>≤</sup>) and <sup>A</sup> = (Q, <sup>≤</sup>, +1), respectively).

Following [19], in order to enable finite presentation of Petri nets with data, and in particular to consider such models as input to algorithms, we restrict to relational structures A that are *homogeneous* [23] and *effective* (the formal definitions are given in Sect. 2). Certain standard decision problems (like the termination problem, the boundedness problem, or the coverability problem, jointly called from now on *standard problems*) are all decidable for Petri nets with ordered data [21] (and in consequence also for Petri nets with unordered data), as the model fits into the framework of well-structured transition systems of [11].

S. Lasota—Partially supported by the European Research Council (ERC) project Lipa under the EU Horizon 2020 research and innovation programme (grant agreement No. 683080).

R. Pi´orkowski—Partially supported by the Polish NCN grant 2016/21/B/ST6/ 01505.

c The Author(s) 2018

C. Baier and U. Dal Lago (Eds.): FOSSACS 2018, LNCS 10803, pp. 548–564, 2018. https://doi.org/10.1007/978-3-319-89366-2\_30

Most importantly, the structure <sup>A</sup> = (Q, <sup>≤</sup>) of ordered data *admits well quasiorder* (wqo) in the following sense: for any wqo X, the set of finite induced substructures of (Q, <sup>≤</sup>) (i.e., finite total orders) labeled by elements of X, ordered naturally by embedding, is a wqo (this is exactly Higman's lemma). Moreover, essentially the same argument can be used for any other homogeneous effective data domain which admits wqo (see [19] for details). On the other hand, for certain homogeneous effective data domains A the standard problems become all undecidable. In the quest for understanding the decidability borderline, the following hypothesis has been formulated in [19]:

*Conjecture 1 (*Wqo *Dichotomy Coinjecture* [19]*).* For an effective homogeneous structure A, either A admits wqo (in which case the standard problems are decidable for Petri nets with data A), or all the standard problems are undecidable for Petri nets with data A.

According to [19], the conjecture could have been equivalently stated for another data-enriched models, e.g., for finite automata with one register [2]. In this paper we consider, for the sake of presentation, only Petri nets with data. Wqo Dichotomy Conjecture holds in special cases when data domains A are undirected or directed graphs, due to the known classifications of homogeneous graphs [6,18].

**Contributions.** We confirm the Wqo Dichotomy Conjecture for data domains A being *strongly*<sup>1</sup> homogeneous *3-graphs*. A 3-graph is a logical structure with three irreflexive symmetric binary relations such that every pair of elements of A belongs to exactly one of the relations (essentially, a clique with 3-colored edges).

Our main technical contribution is a complex analysis of possible shapes of strongly homogeneous 3-graphs, constituting the heart of the proof. We believe that this is a significant step towards full classification of homogeneous 3-graphs. The classification of homogeneous structures is a well-known challenge in model theory, and has been only solved in some cases by now: for undirected graphs [18], directed graphs (the proof of Cherlin spans a book [6]), multi-partite graphs [16], and few others (the survey [23] is an excellent overview of homogeneous structures). Although the full classification of homogeneous 3-graphs was not our primary objective, we believe that our analysis significantly improves our understanding of these structures and can be helpful for classification.

Our result does not fully settle the status of the Wqo Dichotomy Conjecture. Dropping the (mild) strong homogeneity assumption, as well as extending the proof to arbitrarily many symmetric binary relations, is left for future work.

**Related Research.** Net models similar to Petri nets with data have been continuously proposed since the 80s, including, among the others, high-level Petri nets [13], colored Petri nets [17], unordered and ordered data nets [21], ν-Petri nets [25],

<sup>1</sup> Strong homogeneity is a mild strengthening of homogeneity.

and constraint multiset rewriting [5,8,9]. Petri nets with data can be also considered as a reinterpretation of the classical definition of Petri nets in sets with atoms [3,4], where one allows for *orbit-finite* sets of places and transitions instead of just finite ones. The decidability and complexity of standard problems for Petri nets over various data domains has attracted a lot of attention recently, see for instance [14,21,22,24,25].

Wqos are important for their wide applicability in many areas. Studies of wqos similar to ours, in case of graphs, have been conducted by Ding [10] and Cherlin [7]; their framework is different though, as they concentrate on subgraph ordering while we investigate *induced* subgraph (or substructure) ordering.

### **2 Petri Nets with Homogeneous Data**

In this section we provide all necessary preliminaries. Our setting follows [19] and is parametric in the underlying logical structure A, which constitutes a *data domain*. Here are some example data domains:


Note that two latter structures essentially extend the first one: in each case the equality is either present explicitly, or is definable. From now on, we always assume a fixed countably infinite relational structure A with equality over a finite vocabulary (signature) Σ.

**Petri Nets with Data.** Petri nets with data are exactly like classical place/transition Petri nets, except that tokens carry data values and these data values must satisfy a prescribed constraint when a transition is executed. Formally, a *Petri net with data* <sup>A</sup> consists of two disjoint finite sets P (places) and T (transitions), the arcs A <sup>⊆</sup> P×T <sup>∪</sup> T×P, and two labelings:


*Example 1.* For illustration consider a Petri net with equality data A=, with two places <sup>p</sup><sup>1</sup>, p<sup>2</sup> and two transitions <sup>t</sup><sup>1</sup>, t<sup>2</sup> depicted on Fig. 1. Transition <sup>t</sup><sup>1</sup> outputs two tokens with arbitrary but distinct data values onto place p<sup>1</sup>. Transition <sup>t</sup><sup>2</sup>

**Fig. 1.** A Petri net with equality data, with places P = *{*p1, p2*}* and transitions T = *{*t1, t2*}*. In the shown configuration, t<sup>2</sup> can be fired: consume two tokens carrying 3, and put, e.g., token carrying 4 on p<sup>1</sup> and tokens carrying 4, 6 on p2.

inputs two tokens with the same data value, say <sup>a</sup>, one from <sup>p</sup><sup>1</sup> and one from p<sup>2</sup>, and outputs 3 tokens: two tokens with arbitrary but equal data values, say <sup>b</sup>, one onto <sup>p</sup><sup>1</sup> and the other onto <sup>p</sup><sup>2</sup>; and one token with a data value <sup>c</sup> <sup>=</sup> <sup>a</sup> onto <sup>p</sup><sup>2</sup>. Note that the transition <sup>t</sup><sup>2</sup> does not specify whether <sup>b</sup> <sup>=</sup> <sup>a</sup>, or <sup>b</sup> <sup>=</sup> <sup>c</sup>, or <sup>b</sup> <sup>=</sup> a, c, and therefore all three options are allowed. Variables <sup>y</sup>1, y<sup>2</sup> can be considered as input variables of <sup>t</sup><sup>2</sup>, while variables <sup>z</sup>1, z2, z<sup>3</sup> can be considered as output ones; analogously, <sup>t</sup><sup>1</sup> has no input variables, and two output ones <sup>x</sup>1, x<sup>2</sup>.

The formal semantics of Petri nets with data is given by translation to multiset rewriting. Given a set X, finite or infinite, a finite multiset over X is a finite (possibly empty) partial function from X to positive integers. In the sequel let <sup>M</sup>(X) stand for the set of all finite multisets over X. A *multiset rewriting system* (P, <sup>T</sup> ) consists of a set <sup>P</sup> together with a set of rewriting rules:

$$\mathcal{T} \subseteq \mathcal{M}(\mathcal{P}) \times \mathcal{M}(\mathcal{P}).$$

Configurations C ∈ M(P) are finite multisets over <sup>P</sup>, and the step relation −→ between configurations is defined as follows: for every (I,O) ∈ T and every M ∈ M(P), there is the step (+ stands for multiset union)

$$M + I \quad \longrightarrow \ M + O.$$

For instance, a classical Petri net induces a multiset rewriting system where P is the set of places, and T is essentially the set of transitions, both P and T being finite. Configurations correspond to markings.

A Petri net with data <sup>A</sup> induces a multiset rewriting system (P, <sup>T</sup> ), where <sup>P</sup> <sup>=</sup> P <sup>×</sup> <sup>A</sup> and thus is infinite. Configurations are finite multisets over P <sup>×</sup> <sup>A</sup> (cf. a configuration depicted in Fig. 1). The rewriting rules T are defined as

$$\mathcal{T}\_{\cdot} = \bigcup\_{t \in T} \mathcal{T}\_{t},$$

where the relation <sup>T</sup><sup>t</sup> ⊆ M(P) × M(P) is defined as follows: Let <sup>φ</sup> denote the formula labeling the transition <sup>t</sup>, and let <sup>X</sup><sup>i</sup>, <sup>X</sup><sup>o</sup> be the sets of input and output variables of <sup>t</sup>. Every valuation <sup>v</sup><sup>i</sup> : <sup>X</sup><sup>i</sup> <sup>→</sup> <sup>A</sup> gives rise to a multiset <sup>M</sup><sup>v</sup>*<sup>i</sup>* over <sup>P</sup>, where <sup>M</sup><sup>v</sup>*<sup>i</sup>* (p, a) is the (positive) number of variables <sup>x</sup> labeling the arc (p, t) with <sup>v</sup><sup>i</sup>(x) = <sup>a</sup>. Likewise for valuations <sup>v</sup><sup>o</sup> : <sup>X</sup><sup>o</sup> <sup>→</sup> <sup>A</sup>. Then let

$$\mathcal{T}\_t = \left\{ (M\_{v\_i}, M\_{v\_o}) \,|\, v\_i: X\_i \to \mathbb{A}, \,\, v\_o: X\_o \to \mathbb{A}, \,\, v\_i, v\_o \models \phi \right\}.$$

Like P, the set of rewriting rules T is infinite in general.

As usual, for a net N and its configuration C, a run of (N,C) is a maximal, finite or infinite, sequence of steps starting in C.

*Remark 1.* As for classical Petri nets, an essentially equivalent definition can be given in terms of vector addition systems (such a variant has been used in [14] for equality data). Petri nets with equality data are equivalent to (even if defined differently than) unordered data Petri nets of [21], and Petri nets with total ordered data are equivalent to ordered data Petri nets of [21].

**Effective Homogeneous Structures.** For two relational Σ-structures <sup>A</sup> and B we say that A *embeds* in B, written A B, if A is isomorphic to an induced substructure of B, i.e., to a structure obtained by restricting B to a subset of its domain. This is witnessed by an injective function<sup>2</sup> h : A→B, which we call *embedding*. We write Age(A) = { A a finite structure | A <sup>A</sup> } for the class of all finite structures that embed into A, and call it *the age of* A.

Homogeneous structures are defined through their automorphisms: A is homogeneous if every isomorphism of two its finite induced substructures extends to an automorphism of A. In the sequel we will also need an equivalent definition using amalgamation. An *amalgamation instance* consists of three structures <sup>A</sup>, <sup>B</sup><sup>1</sup>, <sup>B</sup><sup>2</sup> <sup>∈</sup> Age(A) and two embeddings <sup>h</sup><sup>1</sup> : A→B<sup>1</sup> and <sup>h</sup><sup>2</sup> : A→B2. A solution of such instance is a structure C ∈ Age(A) and two embeddings <sup>g</sup><sup>1</sup> : <sup>B</sup><sup>1</sup> → C and <sup>g</sup><sup>2</sup> : <sup>B</sup><sup>2</sup> → C such that <sup>g</sup><sup>1</sup> ◦h<sup>1</sup> <sup>=</sup> <sup>g</sup><sup>2</sup> ◦h<sup>2</sup> (we refer the reader to [12] for further details). Intuitively, C represents 'gluing' of B<sup>1</sup> and B<sup>2</sup> along the partial bijection <sup>h</sup><sup>2</sup> ◦ (h<sup>1</sup> <sup>−</sup><sup>1</sup>). In this paper we will restrict ourselves to *singleton* amalgamation instances, where only one element of <sup>B</sup><sup>1</sup> is outside of <sup>h</sup><sup>1</sup>(A), and likewise for <sup>B</sup>2.

An example singleton amalgamation instance is shown on the right, where the graph A consists of the single edge connecting two middle black nodes, B<sup>1</sup> is the left triangle, and B<sup>2</sup> the right one. The dashed line represents an edge that may

(but does not have to) appear in a solution. A is homogeneous if, and only if every amalgamation instance has a solution; in such case we say that Age(A) has the *amalgamation property*. See [23] for further details.

A solution <sup>C</sup> necessarily satisfies g<sup>1</sup>(h<sup>1</sup>(A)) = <sup>g</sup><sup>2</sup>(h<sup>2</sup>(A)) <sup>⊆</sup> <sup>g</sup><sup>1</sup>(B1) <sup>∩</sup> <sup>g</sup><sup>2</sup>(B2); a solution is *strong* if g<sup>1</sup>(h<sup>1</sup>(A)) = <sup>g</sup><sup>1</sup>(B1)∩g<sup>2</sup>(B2). Intuitively, this forbids additional gluing of <sup>B</sup><sup>1</sup> and <sup>B</sup><sup>2</sup> not specified by the partial bijection <sup>h</sup><sup>2</sup> ◦ (h<sup>1</sup> <sup>−</sup><sup>1</sup>). If every amalgamation instance has a strong solution we call A *strongly homogeneous*. This is a mild restriction, as homogeneous structures are typically strongly homogeneous.

<sup>2</sup> We deliberately do not distinguish a structure *<sup>A</sup>* from its domain set.

The equality, nested equality, and total order data domains are strongly homogeneous structures. For instance, in the latter case finite induced substructures are just finite total orders, which satisfy the strong amalgamation property. Many other natural classes of structures have the amalgamation property: finite graphs, finite directed graphs, finite partial orders, finite tournaments, etc. Each of these classes is the age of a strongly homogeneous relational structure, namely the *universal graph* (called also random graph), the universal directed graph, the universal partial order, the universal tournament, respectively. Examples of homogeneous structures abound [23].

Homogeneous structures admit quantifier elimination: every first-order formula is equivalent to (i.e., defines the same set as) a quantifier-free one [23]. Thus it is safe to assume that formulas labeling transitions are quantifier-free.

**Admitting** wqo**.** A *well quasi-order* (wqo) is a well-founded quasi-order with no infinite antichains. For instance, finite multisets <sup>M</sup>(P) over a finite set P, ordered by multiset inclusion , are a wqo. Another example is the embedding quasi-order in Age(A≤) (= all finite total orders) isomorphic to the ordering of natural numbers. Finally, the embedding quasi-order in Age(A) can be lifted from finite structures to finite structures *labeled* by elements of some ordered set (X, <sup>≤</sup>): for two such labeled structures a : A → X and b : B → X we define <sup>a</sup><sup>X</sup> <sup>b</sup> if some embedding <sup>h</sup> : A→B satisfies <sup>a</sup>(x) <sup>≤</sup> <sup>b</sup>(h(x)) for every <sup>x</sup> ∈ A. We say that <sup>A</sup> *admits* wqo when for every wqo (X, <sup>≤</sup>), the lifted embedding order <sup>X</sup> is a wqo too. For instance, <sup>A</sup><sup>≤</sup> admits wqo by Higman's lemma. The Wqo Dichotomy Conjecture for homogeneous undirected (and also directed) graphs is easily shown by inspection of the classifications thereof [6,18]:

**Theorem 1.** *A homogeneous graph* A *either admits* wqo*, or all standard problems are undecidable for Petri nets with data* A*.*

Note the natural correspondence between configurations of a Petri net with data <sup>A</sup>, and structures A ∈ Age(A) labeled by finite multisets over the set P of places:

$$\mathcal{M}(P \times \mathbb{A}) \quad \equiv \quad \{ m: \mathcal{A} \to \mathcal{M}(P) \, | \, \mathcal{A} \in \text{AGE}(\mathbb{A}) \}\dots$$

Thus the lifted embedding quasi-order M(<sup>P</sup> ) is an order on configurations.

**Standard Decision Problems.** A Petri net with data N can be finitely represented by finite sets P, T, A and appropriate labelings with variables and formulas. Due to the homogeneity of <sup>A</sup>, a configuration C can be represented (up to automorphism of <sup>A</sup>) by a structure A ∈ Age(A) labeled by <sup>M</sup>(P). We can thus consider the classical decision problems that input Petri nets with data A, like the *termination problem*: does a given (N,C) have only finite runs? The data domain is considered as a parameter, and hence itself does not constitute part of input. Another classical problem is the *place non-emptiness problem* (markability): given (N,C) and a place p of N, does (N,C) admit a run that puts at least one token on place p? One can also define the appropriate variants of the coverability problem (equivalent to the place non-emptiness problem), the boundedness problem, the evitability problem, etc. (see [19] for details). All the decision problems mentioned above we jointly call *standard problems*.

<sup>A</sup> Σ-structure <sup>A</sup> is called *effective* if the following *age problem* for <sup>A</sup> is decidable: given a finite Σ-structure <sup>A</sup>, decide whether <sup>A</sup> <sup>A</sup>. If <sup>A</sup> admits wqo then application of the framework of well-structured transition systems [11] to the lifted embedding order M(<sup>P</sup> ) yields:

**Theorem 2 (**[19]**).** *If an effective homogeneous structure* A *admits* wqo *then all the standard problems are decidable for Petri nets with data* A*.*

### **3 Results**

A 3-graph <sup>G</sup> = (V,C1, C2, C<sup>3</sup>) consists of a set V and three irreflexive symmetric binary relations <sup>C</sup>1, C2, C<sup>3</sup> <sup>⊆</sup> <sup>V</sup> <sup>2</sup> such that every pair of distinct elements of <sup>V</sup> belongs to exactly one of the three relations. In the sequel we treat a 3-graph as a clique with 3-colored edges. Any graph, including A<sup>=</sup> and A1, can be seen as a 3-graph. Our main result confirms the Wqo Dichotomy Conjecture for strongly homogeneous 3-graphs:

**Theorem 3.** *An effective strongly homogeneous 3-graph* G *either admits* wqo*, or all standard problems are undecidable for Petri nets with data* G*.*

The core technical result of the paper is Theorem 4 below. A *path* is a finite graph with nodes {v1,...,v<sup>n</sup>} whose only edges are pairs {vi, vi+1}. The nodes <sup>v</sup>1, v<sup>n</sup> are *ends* of the path, and <sup>n</sup> is its length.

**Theorem 4.** *A strongly homogeneous 3-graph* G *either admits* wqo*, or for some* i, j ∈ {1, <sup>2</sup>, <sup>3</sup>} *(not necessarily distinct) the graph* (V,C<sup>i</sup> <sup>∪</sup> <sup>C</sup><sup>j</sup> ) *contains arbitrarily long paths as induced subgraphs.*

In the rest of the paper we concentrate solely on (parts of) the proof of Theorem 4. The omitted parts, and well as the proof that Theorem 4 implies Theorem 3, are to be found in the full version of this paper [20].

*Example 2.* For a quasi-order (X, <sup>≤</sup>), the multiset inclusion is defined as follows for m, m ∈ M(X): m is included in m if m is obtained from m by a sequence of operations, where each operation either removes some element, or replaces some element by a smaller one wrt. <sup>≤</sup>. The structure <sup>A</sup><sup>=</sup> = (N, =) admits wqo. Indeed, Age(A=) contains just finite pure sets, thus <sup>X</sup> is quasi-order-isomorphic to the multiset inclusion on <sup>M</sup>(X), and is therefore a wqo whenever the underlying quasi-order (X, <sup>≤</sup>) is. Similarly, <sup>A</sup><sup>1</sup> = (N<sup>2</sup>, <sup>=</sup><sup>1</sup>, =) also admits wqo, as <sup>X</sup> is quasi-order-isomorphic to the multiset inclusion on <sup>M</sup>(M(X)).

On the other hand, consider a 3-graph (N<sup>2</sup>, <sup>=</sup><sup>1</sup>, <sup>=</sup><sup>2</sup>, =12) where =<sup>2</sup> is symmetric to =<sup>1</sup> and (n, m) =<sup>12</sup> (n , m ) if n <sup>=</sup> n and m <sup>=</sup> m . It refines A<sup>1</sup> and does not admit wqo. Indeed, in agreement with Theorem 4, the graph (N<sup>2</sup>, <sup>=</sup><sup>1</sup> <sup>∪</sup> <sup>=</sup>2) contains arbitrarily long paths of the shape presented on the right, where the two colors depict =<sup>1</sup> and =2, respectively, and lack of color corresponds to =12. Note that (N<sup>2</sup>, <sup>=</sup><sup>1</sup>, <sup>=</sup><sup>2</sup>, =12) is homogeneous but not strongly so.

### **4 Proof of Theorem 4**

From now on we consider a fixed 3-graph <sup>G</sup> = (V,C1, C2, C<sup>3</sup>) as data domain, assuming G to be countably infinite and strongly homogeneous. We treat <sup>G</sup> as a clique with 3-colored edges: we call <sup>C</sup>1, C<sup>2</sup> and <sup>C</sup><sup>3</sup> *colors* and put *Colors* <sup>=</sup> {C1, C2, C<sup>3</sup>}⊂P(<sup>V</sup> <sup>×</sup> <sup>V</sup> ). To denote individual colors from this set, we will use variables **<sup>a</sup>**, **<sup>b</sup>**, **<sup>c</sup>** and **<sup>x</sup>**, **<sup>y</sup>**, **<sup>z</sup>**. A path in the graph (V, **<sup>a</sup>** <sup>∪</sup> **<sup>b</sup>**) we call **ab***-path* (**ab** ∈ *Colors*); for simplicity, we will write **a***-path* instead of **aa**-path. Likewise we speak of **ab**-cliques, **a**-cliques, **ab**-cycles, etc. A *triangle* **abc** is a 3-clique with edges colored by **<sup>a</sup>**, **<sup>b</sup>**, **<sup>c</sup>**. (Note that **abc** <sup>=</sup> **bca** <sup>=</sup> **cba**).

*Sketch of the Proof.* The Lemma 1 below states that any 3-graph G has to meet one of the four listed cases. It splits the proof into four separate paths:

We present in detail only one of the three nontrivial paths – one corresponding to case (C). Cases (A) and (B) are treated in the full version [20]. Case (A) constitutes the most difficult part of the proof and involves a complex and delicate analysis of consequences of the amalgamation property. It consists of four step that deduce extension of the assumed induced substructures by individual vertices, individual edges, paths of length 2, resp., culminating in derivation of arbitrarily long paths. Thus in case (A) only the second condition of Theorem 4 is possible, while in the other two cases both conditions of Theorem 4 may hold true.

**Lemma 1.** *Every homogeneous 3-graph* <sup>G</sup> = (V,C<sup>1</sup>, C<sup>2</sup>, C<sup>3</sup>) *satisfies one of the following conditions:*

*(A) for some color* **<sup>c</sup>** <sup>∈</sup> *Colors ,* <sup>G</sup> *contains the following induced substructures:*

*(B) for some colors* **<sup>x</sup>** <sup>=</sup> **<sup>y</sup>***,* (V, **<sup>x</sup>** <sup>∪</sup> **<sup>y</sup>**) *is a union of disjoint cliques,*

*(C) for some color* **<sup>x</sup>***,* (V, **<sup>x</sup>**) *is a union of finitely many disjoint infinite cliques,*

*(D) for some colors* **<sup>x</sup>** <sup>=</sup> **<sup>y</sup>***,* (V, **<sup>x</sup>** <sup>∪</sup> **<sup>y</sup>**) *contains arbitrarily long paths. Proof.* By Ramsey theorem, G contains an arbitrarily large monochromatic cliques. Let us state a bit stronger requirement:

**Condition** ♠**:** For some **<sup>a</sup>**, **<sup>c</sup>** <sup>∈</sup> *Colors*, <sup>G</sup> contains arbitrarily large **<sup>c</sup>**-cliques and a triangle **acc** with exactly two **c**-edges (**a** = **c**).

Consider two cases, depending on whether the condition ♠ is satisfied or not.

**Case** 1◦**.** Assume that G contains both arbitrarily large **c**-cliques and a triangle **acc** for some **<sup>a</sup>**, **<sup>c</sup>** <sup>∈</sup> *Colors*. Let **<sup>b</sup>** be the third, remaining color. Our goal will be to show that either (A) or (B) holds.

If the graph (V, **<sup>a</sup>**∪**b**) is a disjoint sum of cliques, we immediately obtain (B). Suppose the contrary. We get that G has to contain one of the three possible counterexamples for transitivity of relation **a** ∪ **b**:

If it contains the triangle **aac** or **abc**, case (A) holds.

Suppose we got **bbc**. Let us check this time whether colors **a** and **c** form a union of disjoint cliques. Again, if it is so, we easily get (B), so we assume the contrary. Similarly, we necessarily obtain one of the following triangles:

This time case (A) also holds for two out of the three triangles above:


It only remains to consider the situation when we got **aab**. We use it together with previously obtained triangle **bbc** to build the following instance of singleton amalgamation:

Depending on the color of the dashed edge, in the solution we get one of the following triangles:

and each one alone completes the requirements of (A). This closes case 1◦.

**Case** <sup>2</sup>◦**.** Suppose condition ♠ is false. Remind that <sup>G</sup> contains arbitrarily large **<sup>c</sup>**-cliques for some **<sup>c</sup>** <sup>∈</sup> <sup>G</sup>. Since ♠ does not hold, the graph does not contain a triangle **cca** – in other words, the color **c** appears only within cliques. We conclude that (V, **<sup>c</sup>**) is a union of disjoint cliques. Clearly at least one of such cliques has to be infinite. By homogeneity we get that all the cliques in (V, **<sup>c</sup>**) have to be infinite. Now our target is to show that either (C) or (D) holds.

The case (C) is fulfilled when there are only finitely many **c**-cliques. Let us assume the contrary. In each of the **c**-cliques we chose one vertex. Edges between the chosen vertices form an infinite **ab**-clique K. Using Ramsey theorem again, we conclude that in K one of the colors **<sup>a</sup>**, **<sup>b</sup>** forms arbitrarily large monochromatic cliques. W.l.o.g. suppose that this is color **b**.

If the graph <sup>G</sup> contained **ybb** for some **<sup>y</sup>** <sup>=</sup> **<sup>b</sup>**, then the assumptions of ♠ would be met, leading to a contradiction. Therefore we conclude that (V, **<sup>b</sup>**) is a union of disjoint infinite **b**-cliques.

When there are only finitely many **b**-cliques, condition (C) is fulfilled. Otherwise we know that G is a union of infinitely many **x**-cliques for both **x** = **c** and **x** = **b**. Using homogenity, it is easy to show that then every pair of differently colored cliques has *exactly one* common vertex, so the graph G takes the form as depicted in Example 2. A graph of such form contains arbitrarily long **bc**-path, so the requirements of (D) are met.

#### **4.1 Case (C)**

Let **c** be the color that satisfies condition (C), and **a**, **b** — the remaining two colors. In this section we often treat <sup>G</sup> as the k-partite graph (V, **<sup>a</sup>** <sup>∪</sup> **<sup>b</sup>**) (for some k <sup>∈</sup> <sup>N</sup>): k cliques of color **<sup>c</sup>** allow to distinguish k groups of vertices <sup>V</sup><sup>1</sup> <sup>∪</sup> <sup>V</sup><sup>2</sup> ∪ ··· ∪ <sup>V</sup><sup>k</sup> <sup>=</sup> <sup>V</sup> (from now on we will refer to them as layers). The remaining two colors can be interpreted as existence (**a**) and nonexistence (**b**) of edges between these groups.

*Remark :* We observe that the special color **c** between vertices within each layer <sup>V</sup><sup>i</sup> ensures that the automorphisms of <sup>G</sup> will not 'mix' those layers: when two vertices u, v belong to a common layer V<sup>i</sup>, then their images <sup>f</sup>(u), f(v) will also belong to some common layer <sup>V</sup><sup>j</sup> , no matter what automorphism <sup>f</sup> <sup>∈</sup> Aut(G) we choose. Obviously, the automorphisms can switch positions of whole layers, e.g. move all vertices from <sup>V</sup><sup>i</sup> to some <sup>V</sup><sup>j</sup> and vice versa—in this respect the layers are undistinguishable.

**Lemma 2.** *For every* i, j ∈ {1, <sup>2</sup>,...,k} *and* **a** ∈ *Colors (***a** = **c***) the bipartite graph* <sup>G</sup>i,j = (V<sup>i</sup>∪V<sup>j</sup> , **<sup>a</sup>**∩(V<sup>i</sup>∪V<sup>j</sup> )<sup>2</sup>, V<sup>i</sup>, V<sup>j</sup> ) *(with two distinguishable sides* <sup>V</sup><sup>i</sup>, V<sup>j</sup> *) is homogeneous.*

The vertex sets <sup>V</sup><sup>i</sup> and <sup>V</sup><sup>j</sup> are used here as unary relations that allow to tell the two layers of Gi,j (sides of Gi,j ) apart. An example is shown on the right, with three layers <sup>V</sup><sup>1</sup>, V<sup>2</sup> and <sup>V</sup><sup>3</sup>, and three bipartite graphs <sup>G</sup>1,2, <sup>G</sup>2,<sup>3</sup> and G1,3.

*Proof.* Fix Gi,j a bipartite graph. To prove its homogeneity we have to show that each isomorphism of two of its finite induced subgraphs may be extended to some automorphism of <sup>G</sup>i,j . Let us then take some given automorphism <sup>f</sup> : <sup>G</sup><sup>1</sup> <sup>→</sup> <sup>G</sup><sup>2</sup> for some finite induced subgraphs <sup>G</sup>1, G<sup>2</sup> of <sup>G</sup>i,j . It is easy to extend it to a full automorphism when it 'touches' both layers of <sup>G</sup>i,j , i.e.:

$$V(G\_1) \cap V\_i \neq \emptyset \,\,\wedge \,\, V(G\_1) \cap V\_j \neq \emptyset$$

where <sup>V</sup> (G<sup>1</sup>) is the set of vertices of G<sup>1</sup>. In this case, by homogeneity of <sup>G</sup>, we construct a full automorphism f : <sup>G</sup> <sup>→</sup> <sup>G</sup>, which extends <sup>f</sup>. It is easy to see that in this case <sup>f</sup> has to fix the layers <sup>V</sup><sup>i</sup> and <sup>V</sup><sup>j</sup> , and hence <sup>f</sup> restricted to the graph Gi,j is a correct automorphism of this graph.

Things get more complicated when f operates only on some single layer of <sup>G</sup>i,j . W.l.o.g. suppose that it 'touches' only <sup>V</sup><sup>i</sup>, so <sup>V</sup> (G<sup>1</sup>) <sup>∩</sup> <sup>V</sup><sup>j</sup> <sup>=</sup> <sup>∅</sup>. Now the above construction will not work out of the box—if we were unlucky, the automorphism of <sup>G</sup> we get by homogeneity moves the whole layer <sup>V</sup><sup>j</sup> to some <sup>V</sup><sup>n</sup> located 'outside' the graph <sup>G</sup>i,j (n /∈ {i, j}).

It will be handy to make the following observation: when <sup>f</sup> 'touches' only <sup>V</sup><sup>i</sup> we may assume that <sup>V</sup> (G<sup>1</sup>) <sup>∩</sup> <sup>V</sup> (G<sup>2</sup>) = <sup>∅</sup>. Indeed, every function <sup>g</sup> : <sup>G</sup><sup>1</sup> <sup>→</sup> <sup>G</sup><sup>2</sup> that violates this condition may be decomposed as <sup>g</sup> <sup>=</sup> <sup>f</sup><sup>2</sup> ◦ <sup>f</sup><sup>1</sup> for some <sup>f</sup><sup>1</sup>, f<sup>2</sup>:

$$G\_1 \xrightarrow{f\_1} H \xrightarrow{f\_2} G\_2$$

such that <sup>H</sup> is disjoint both with <sup>G</sup><sup>1</sup> and with <sup>G</sup><sup>2</sup>.

Now, let N <sup>=</sup> <sup>|</sup>V (G<sup>1</sup>)<sup>|</sup> <sup>=</sup> <sup>|</sup><sup>V</sup> (G<sup>2</sup>)<sup>|</sup> be the size of the domain of isomorphism <sup>f</sup>. Let us take an arbitrary infinite family (S<sup>n</sup>)<sup>n</sup>∈<sup>N</sup> of subgraphs of <sup>G</sup> with disjoint vertex sets, such that the following conditions are met:

– <sup>|</sup><sup>V</sup> (S<sup>n</sup>) <sup>∩</sup> <sup>V</sup><sup>m</sup><sup>|</sup> = 1 for <sup>m</sup> <sup>=</sup> <sup>i</sup> (and this single vertex will be denoted as <sup>v</sup>(n) <sup>m</sup> ), – <sup>|</sup><sup>V</sup> (S<sup>n</sup>) <sup>∩</sup> V<sup>i</sup><sup>|</sup> <sup>=</sup> N (denote these vertices as s (n) <sup>1</sup> , s(n) <sup>2</sup> , s(n) <sup>3</sup> ,...,s(n) <sup>N</sup> ).

We define a *connection type* of a layer <sup>V</sup><sup>i</sup> with <sup>V</sup><sup>m</sup> in the graph <sup>S</sup><sup>n</sup> as the <sup>N</sup>element sequence of colors of edges from the list bellow:

$$(\{s\_1^{(n)}, v\_m^{(n)}\}, \{s\_2^{(n)}, v\_m^{(n)}\}, \dots, \{s\_N^{(n)}, v\_m^{(n)}\})$$

E.g. in the graph bellow, the connection type of layer <sup>V</sup><sup>i</sup> <sup>=</sup> <sup>V</sup><sup>3</sup> with <sup>V</sup><sup>1</sup> is **abba**, and with <sup>V</sup><sup>2</sup> — **aaba** (remembering that **<sup>b</sup>** is treated as lack of an edge):

Furthermore, we define the type of graph <sup>S</sup><sup>n</sup> to be the sequence of types arising between <sup>V</sup><sup>i</sup> and other layers plus the list of edge-colors between all pairs of vertices <sup>v</sup>(n) • (enumerated in some consistent way). As there are only finitely many such types, by pigeonhole principle there exists a pair of graphs <sup>S</sup><sup>a</sup> and <sup>S</sup><sup>b</sup> with the same type.

Let us fix some order on vertices of <sup>G</sup><sup>1</sup>: <sup>V</sup> (G<sup>1</sup>) = {g1, g2,...,g<sup>N</sup> }. Let <sup>h</sup> be the partial isomorphism that moves the vertices as follows:

$$\begin{aligned} s\_1^{(a)} &\to g\_1 & & s\_1^{(b)} &\to f(g\_1) \\ &\dots & & \dots \\ s\_N^{(a)} &\to g\_N & & s\_N^{(b)} &\to f(g\_N) \end{aligned}$$

By homogeneity, it has to extend to a full automorphism <sup>h</sup> <sup>∈</sup> Aut(G). In particular, in the neighbourhood of <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup> there will be images of all vertices <sup>v</sup>(α) • of graphs <sup>S</sup><sup>a</sup> and <sup>S</sup><sup>b</sup>:

$$h'(v\_1^{(\alpha)}), h'(v\_2^{(\alpha)}), \dots, h'(v\_{i-1}^{(\alpha)}), h'(v\_{i+1}^{(\alpha)}), \dots, h'(v\_k^{(\alpha)})$$

(for <sup>α</sup> in {a, b}). What follows is that <sup>G</sup><sup>1</sup> with added vertices <sup>h</sup> (v(a) • ) has the same type as <sup>G</sup><sup>2</sup> with <sup>h</sup> (v(b) • ) respectively (that type may differ from the type of <sup>S</sup><sup>a</sup> and <sup>S</sup><sup>b</sup> though!). It is best illustrated on a picture:

Above, the colored triangles represent the types of connections. The order of those types may get permuted when applying h , but still—in line with the remark — for each β ∈ {1, <sup>2</sup>,...,k}\{i} the vertex h v(a) β must stay in the same layer as h v(b) β , furthermore their type of connection with layer <sup>V</sup><sup>i</sup> is preserved.

Extending the isomorphism f in a natural way (thanks to the compatibility of types) on those newly obtained vertices:

$$h'\left(v\_{\bullet}^{(a)}\right) \xrightarrow{f \longrightarrow f'} h'\left(v\_{\bullet}^{(b)}\right)$$

we get an isomorphism that this time 'operates' on all layers <sup>V</sup>•. If we now extend it to an automorphism of the whole G, we will get a function that fixes all layers <sup>V</sup>•. This function may be safely restricted to <sup>V</sup><sup>i</sup> <sup>∪</sup> <sup>V</sup><sup>j</sup> , staying a correct automorphism of our initial bipartite graph <sup>G</sup>i,j , which completes the proof.

We are going to apply to graphs Gi,j the following classification result:

**Theorem 5 (**[16]**).** *A countably infinite homogeneous bipartite graph (with distinguishable sides) is either empty, or full, or a perfect matching, or the complement of a perfect matching, or a* universal *graph.*

From our point of view, all we need to know about the universal graph is that it contains arbitrarily long paths which – translated to our notation – would mean that Gi,j contains arbitrarily long **a**-paths. Therefore in our further considerations we assume that Gi,j is not universal which, in our notation, leaves two types of Gi,j :


Graphs of type 2. may be seen as bijections between their sets of vertices (layers). Lemma 3 states that those bijections have to agree with each other.

**Lemma 3.** *Let* <sup>V</sup><sup>i</sup>, V<sup>j</sup> , V<sup>k</sup> *be some arbitrary pairwise different layers, such that* <sup>G</sup>i,j *is of type 2 and* <sup>ψ</sup> : <sup>V</sup><sup>i</sup> <sup>→</sup> <sup>V</sup><sup>j</sup> *is the bijection it determines. Then* <sup>ψ</sup> *takes* **<sup>a</sup>** <sup>∩</sup> (V<sup>i</sup> <sup>∪</sup> <sup>V</sup><sup>k</sup>) *to* **<sup>a</sup>** <sup>∩</sup> (V<sup>j</sup> <sup>∪</sup> <sup>V</sup><sup>k</sup>)*, or to its complement. Formally:*

$$\left(\underbrace{\forall\limits\_{u\in V\_{i}}\forall\limits\_{v\in V\_{k}}\underbrace{u\text{ a\ }v}\_{\bullet}\Leftrightarrow\underbrace{\psi(u)\text{ a\ }v}\_{\bullet}}\_{\bullet}\right)\lor\left(\underbrace{\forall\limits\_{u\in V\_{i}}\forall\limits\_{v\in V\_{k}}\underbrace{\neg u\text{ a\ }v}\_{\circlearrowleft}}\_{\circlearrowleft}\Leftrightarrow\underbrace{\psi(u)\text{ a\ }v}\_{\circlearrowleft}\right)\right)$$

*Proof.* We head towards a contradiction. Negating the claim we get:

 ∃ u∈V*<sup>i</sup>* <sup>∃</sup> <sup>v</sup>∈V*<sup>k</sup>* ¬♣ ∧ ♠ ∨ ♣ ∧ ¬♠ ∧ ∃ u∈V*<sup>i</sup>* <sup>∃</sup> <sup>v</sup>∈V*<sup>k</sup>* ¬♥ ∧ ♦ ∨ ♥ ∧ ¬♦

which leads to four cases with similar proofs. We will consider one of them (corresponding to ¬♥ ∧ ♦ and ♣ ∧ ¬♠) and omit the other. Let us then assume that there exist x, x <sup>∈</sup> <sup>V</sup><sup>i</sup> and y, y <sup>∈</sup> <sup>V</sup><sup>k</sup> such that:

$$x \text{ a } y \wedge x' \text{ a } y' \wedge \psi(x) \text{ a } y \wedge \neg\psi(x') \text{ a } y'.$$

Let g be a partial isomorphism of the form g <sup>=</sup> {x <sup>→</sup> x , y <sup>→</sup> y }. By homogeneity of <sup>G</sup>, there is some full automorphism g <sup>∈</sup> Aut(G) extending g. If additionally we were able to force <sup>g</sup> to fix the layer <sup>V</sup><sup>j</sup> , we would be almost done. Let us try to achieve that property.

For that purpose, in <sup>V</sup><sup>j</sup> we choose a vertex <sup>v</sup> such that:


Clearly such vertex must exist – two above conditions exclude at most 4 different vertices from the infinite set of candidates. The function g extended with v <sup>g</sup> −→ v stays a correct isomorphism, because:


Presence of the vertex <sup>v</sup> ensures that layer <sup>V</sup><sup>j</sup> is preserved by the full automorphism g <sup>∈</sup> Aut(G) we get by homogeneity.

Since <sup>G</sup>i,j is of type 2, the vertex <sup>ψ</sup>(x ) is the only possible choice for the image of ψ(x) under g — this is the only vertex <sup>x</sup> is connected to by an appropriately colored edge. Because g is an automorphism, we get that <sup>ψ</sup>(x ) **a** y , which leads us to the contradiction.

From the lemma we have just proved one easily derives the following corollary:

**Corollary 1.** *The following relation* ≡ *on layers is transitive:*

<sup>V</sup><sup>i</sup> <sup>≡</sup> <sup>V</sup><sup>j</sup> <sup>⇔</sup> *the graph* <sup>G</sup>i,j *is of type 2.*

*Furthermore, if* <sup>V</sup><sup>i</sup> <sup>≡</sup> <sup>V</sup><sup>j</sup> *and* <sup>V</sup><sup>j</sup> <sup>≡</sup> <sup>V</sup><sup>k</sup> *then* <sup>f</sup>j,k ◦ <sup>f</sup>i,j <sup>=</sup> <sup>f</sup>i,k*, where* <sup>f</sup>i,j , fi,k, fj,k *are the bijections determined by graphs* <sup>G</sup>i,j , <sup>G</sup>i,k *and* <sup>G</sup>j,k*.*

In Lemma 5 below, which is the last step of the proof of case (C), we will apply the following fact:

**Lemma 4.** *Consider a homogeneous 3-graph* G *and a partition of its vertex set* V <sup>=</sup> <sup>n</sup>∈<sup>N</sup> <sup>U</sup><sup>n</sup> *into sets* <sup>U</sup>• *of equal finite cardinality. Suppose further that for every* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, there is an automorphism* <sup>π</sup><sup>n</sup> *of* <sup>G</sup> *that swaps* <sup>U</sup><sup>0</sup> *with* <sup>U</sup><sup>n</sup> *and is identity elsewhere. Then* G *admits* wqo*.*

*Proof.* Let <sup>G</sup> = (V, **<sup>a</sup>**, **<sup>b</sup>**, **<sup>c</sup>**) be a 3-graph. Define for <sup>u</sup> <sup>∈</sup> <sup>U</sup><sup>0</sup> the sets <sup>V</sup><sup>u</sup> <sup>⊆</sup> <sup>V</sup> , which we call *layers*:

$$V\_u = \left\{ \pi\_n(u) \mid n \in \mathbb{N} \right\}.$$

$$\bigcup\_{\ldots} \ldots \quad \text{---}$$

We will prove that the structure <sup>G</sup> = (V, **<sup>a</sup>**, **<sup>b</sup>**, **<sup>c</sup>**,(V<sup>u</sup>)<sup>u</sup>∈U<sup>0</sup> ) admits wqo. This will imply that G admits wqo as well; indeed, compared to G, structure G is equipped with additional unary relations V•, which only makes the order in Age(G ) finer than the analogous order in Age(G).

Let <sup>G</sup><sup>n</sup> denote the induced substructure of <sup>G</sup> on vertex set <sup>U</sup><sup>n</sup>. By the assumptions, for every n, m <sup>∈</sup> <sup>N</sup> there is a swap of <sup>U</sup><sup>n</sup> and <sup>U</sup><sup>m</sup> that, extended with identity elsewhere, is an automorphism of G . In consequence, all structures <sup>G</sup>• are isomorphic, and the embedding order of induced substructures of <sup>G</sup> is isomorphic to finite multisets over Age(G<sup>0</sup>), ordered by multiset inclusion. Thus (Age(G ), ) is isomorphic to the multiset inclusion in <sup>M</sup>(Age(G<sup>0</sup>)), which is a wqo as <sup>U</sup><sup>0</sup> is finite. For any wqo (X, <sup>≤</sup>), analogous isomorphism holds between the lifted embedding order (Age(G ), X) and the multiset inclusion in multisets over induced substructures of <sup>G</sup><sup>0</sup> labeled by elements of <sup>X</sup>, and again the latter order is a wqo. Thus <sup>G</sup> admits wqo.

**Lemma 5.** *The 3-graph* G *admits* wqo*.*

*Proof.* We are going to prepare the ground for the use of Lemma 4. By Corollary 1. the vertex set V partitions into V <sup>=</sup> <sup>n</sup>∈<sup>N</sup> <sup>U</sup><sup>n</sup> so that


Intuitively, <sup>G</sup> can by cut into thin 'slices' perpendicular to the layers <sup>V</sup>•. By thin we mean that the slices have exactly one vertex in each layer. The cut is made along the bijections dictated by the graphs of type 2. as in the picture bellow:

We observe that for every <sup>n</sup>, the bijection <sup>h</sup><sup>n</sup> : <sup>V</sup> <sup>→</sup> <sup>V</sup> that swaps <sup>U</sup><sup>1</sup> and <sup>U</sup><sup>n</sup> along the only bijection <sup>U</sup><sup>1</sup> <sup>→</sup> <sup>U</sup><sup>n</sup> that preserves layers, and is identity elsewhere, is an automorphism of <sup>G</sup>. Indeed, for any three slices <sup>U</sup><sup>a</sup>, U<sup>b</sup>, U<sup>c</sup> we have that:

$$v\_i^{(a)} \text{ a } v\_j^{(c)} \Leftrightarrow v\_i^{(b)} \text{ a } v\_j^{(c)}$$

so the edges v(a) <sup>i</sup> , v(c) j and v(b) <sup>i</sup> , v(c) j are colored the same way. The above equivalence is obvious in case when Gi,j is a graph of type 1. In the case of graph of type 2, the vertex v(c) <sup>i</sup> is connected with all vertices from <sup>V</sup><sup>j</sup> but one by **<sup>x</sup>**-edges for some **<sup>x</sup>** ∈ {**a**, **<sup>b</sup>**}. However, the special vertex <sup>f</sup>i,j (v(c) <sup>i</sup> ) that is not connected by a **<sup>x</sup>**-edge, by the condition (b), also belongs to <sup>U</sup><sup>c</sup>, so it does not interfere with above equivalence.

By Lemma <sup>4</sup> we deduce that <sup>G</sup> admits wqo, which completes the proof.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Verifying Higher-Order Functions with Tree Automata**

Thomas Genet, Timoth´ee Haudebourg(B), and Thomas Jensen

Univ. Rennes, Inria, IRISA, Rennes, France timothee.haudebourg@irisa.fr

**Abstract.** This paper describes a fully automatic technique for verifying safety properties of higher-order functional programs. Tree automata are used to represent sets of reachable states and functional programs are modeled using term rewriting systems. From a tree automaton representing the initial state, a completion algorithm iteratively computes an automaton which over-approximates the output set of the program to verify. We identify a subclass of higher-order functional programs for which the completion is guaranteed to terminate. Precision and termination are obtained conjointly by a careful choice of equations between terms. The verification objective can be used to generate sets of equations automatically. Our experiments show that tree automata are sufficiently expressive to prove intricate safety properties and sufficiently simple for the verification result to be certified in Coq.

### **1 Introduction**

Higher-order functions are an integral feature of modern programming languages such as Java, Scala or JavaScript, not to mention Haskell and Caml. Higher-order functions are useful for program structuring but pose a challenge when it comes to reasoning about the correctness of programs that employ them. To this end, the correctness-minded software engineer can opt for proving properties interactively with the help of a proof assistant such as Coq [13] or Isabelle/HOL [30], or write a specification in a formalism such as Liquid Types [31] or Bounded Refinement Types [33,34] and ask an SMT solver whether it can prove the verification conditions generated from this specification. This approach requires expertise of the formal method used, and both the proof construction and the annotation phase can be time consuming.

Another approach is based on *fully automated* verification tools, where the proof is carried out automatically without annotations or intermediate lemmas. This approach is accessible to a larger class of programmers but applies to a more restricted class of program properties. The flow analysis of higher-order functions was studied by Jones [21] who proposed to model higher-order functions as term rewriting systems and use regular grammars to approximate the result. More recently, the breakthrough results of Ong [29] and Kobayashi [23,24,26] show that combining abstraction with model checking techniques can be used with success to analyse higher-order functions automatically. Their approach relies on abstraction for computing over-approximations of the set of reachable states, on which safety properties can then be verified.

In this paper, we pursue the goals of higher-order functional verification using an approach based on the original term rewriting models of Jones. We present a formal verification technique based on Tree Automata Completion (TAC) [20], capable of checking a class of properties, called *regular properties*, of higher-order programs in a fully automatic manner. In our approach, a program is represented as a term rewriting system R and the set of (possibly infinite) inputs to this program as a tree automaton A. The TAC algorithm computes a new automaton A<sup>∗</sup>, by *completing* A with all terms reachable from A by R-rewriting. This automaton representation of the *reachable terms* contains all intermediate states as well as the final output of the program. Checking correctness properties of the program is then reduced to checking properties of the computed automaton. Moreover, our completion-based approach permits to *certify* automatically A<sup>∗</sup> in Coq [6], i.e. given A, R and A<sup>∗</sup>, obtain the formal proof that A<sup>∗</sup> recognizes all terms reachable from A by R-rewriting.

*Example 1.* The following term rewriting system R defines the *filter* function along with the two predicates even and odd on Peano's natural numbers.

$$\begin{array}{c} @@(@(filter,\underline{p}),cons(\underline{x},\underline{l})) \rightarrow \ \mathtt{if}\ @(\underline{p},\underline{x}) \ \mathtt{then}\ cons(\underline{x},\,\@@(filter,\underline{p}),\underline{l})\\ \mathtt{else}\ \@(\@(filter,\underline{p}),\underline{l})\\ @@(\@(filter,\underline{p}),\textit{nil}) \rightarrow nil\\ @@(even,\,0) \rightarrow \textit{true}\\ @@(odd,\,0) \rightarrow \textit{false}\end{array} \begin{array}{c} @@(filter,\underline{p}),\underline{l}\\ @(\underline{\mathtt{else}},\,\underline{\mathtt{op}}),\,\underline{l}\\ @(\underline{\mathtt{else}},\,\underline{\mathtt{op}}) \rightarrow @@(odd,\,\underline{\mathtt{x}})\\ @(\underline{\mathtt{odd}},\,\underline{\mathtt{st}}(\underline{\mathtt{x}})) \rightarrow @(\underline{\mathtt{even}},\underline{\mathtt{x}})\end{array}$$

This function returns the input list where all elements not satisfying the input boolean function p are filtered out. Variables are underlined and the special symbol @ denotes function application where @(f,x) means "x applied to f".

We want to check that for all lists l of natural numbers, @(@(*filter* , *odd*), l) filters out all even numbers. One way to do this is to write a higher-order predicate, *exists*, and check that there exists no even number in the resulting list, i.e. that @(@(*exists*, *even*), @(@(*filter* , *odd*), l)) always rewrites to *false*. Let A be the tree automaton recognising terms of form @(@(*exists*, *even*), @(@(*filter* , *odd*), l)) where l is any list of natural numbers. The completion algorithm computes an automaton A<sup>∗</sup> recognising every term reachable from L(A) (the set of terms recognised by A) using R with the definition of the *exists* function. Formally,

$$L(\mathcal{A}^\*) = \mathcal{R}^\*(L(\mathcal{A})) = \{ t \mid \exists s \in L(\mathcal{A}), s \to\_{\mathcal{R}}^\* t \}.$$

To prove the expected property, it suffices to check that *true* is not reachable, i.e. *true* does not belong to the regular set L(A<sup>∗</sup>). We denote by *regular properties* the family of properties characterised by a regular set. In particular, regular properties do not count symbols in terms, nor relate subterm heights (a property comparing the length of the list before and after *filter* is not regular)

Termination of the tree automata completion algorithm is not ensured in general [19]. For instance, if R∗(L(A)) is not regular, it cannot be represented as a tree automaton. In this case, the user can provide a set of *equations* that will force termination by introducing an approximation based on *equational abstraction* [27]: L(A∗) ⊇ R∗(L(A)). Equations make TAC powerful enough to verify first-order functional programs [19]. However, state-of-the-art TAC has two short-comings. (i) Equations must be given by the user, which goes against full automation, and (ii) even with equations, termination is not guaranteed in the case of *higher-order programs*. In this paper we propose a solution to these shortcomings with the following contributions:


All proofs missing in this paper can be found in the accompanying technical report [17]. The paper is organised as follow: We describe the completion algorithm and how to use equations to ensure termination in Sect. 2.1. The technical contributions as described above are developed in Sects. 3 to 5. In Sect. 6, we present a series of experiments validating our verification technique, and discuss the certification of results in Coq. We present related work in Sect. 7. Section 8 concludes the paper.

### **2 Background**

This section introduces basic concepts used throughout the paper. We recall the usual definitions of term rewriting systems and tree automata, and present the completion algorithm which forms the basis of our verification technique.

### **2.1 Term Rewriting and Tree Automata**

**Terms.** An alphabet <sup>F</sup> is a finite set of symbols, with an arity function ar : F → <sup>N</sup>. Symbols represent constructors such as nil or cons, or functions such as *filter* , etc. For simplicity, we also write <sup>f</sup> ∈ F<sup>n</sup> when <sup>f</sup> ∈ F and ar(f) = <sup>n</sup>. For instance, cons ∈ F<sup>2</sup> and nil ∈ F<sup>0</sup>. An alphabet <sup>F</sup> and finite set of variables X induces a set of *terms* T (F, X ) such that:

$$\underline{x}\in\mathcal{T}(\mathcal{F},\mathcal{X})\Leftarrow\underline{x}\in\mathcal{X}$$

$$f(t\_1,\ldots,t\_n)\in\mathcal{T}(\mathcal{F},\mathcal{X})\Leftarrowf\in\mathcal{F}^n\text{ and }\ t\_1,\ldots,t\_n\in\mathcal{T}(\mathcal{F},\mathcal{X})$$

A *language* is a set of terms. A term t is *linear* if the multiplicity of each variable in t is at most 1, and *closed* if it contains no variables. The set of closed terms is written <sup>T</sup> (F). A *position* in a term <sup>t</sup> is a word over <sup>N</sup> pointing to a *subterm* of t. *Pos*(t) is the set of positions in t, one for each subterm of t. It is defined by:

$$\begin{aligned} Pos(\underline{x}) &= \{\lambda\} \\ Pos(f(t\_1, \dots, t\_n)) &= \{\lambda\} \cup \{i.p \mid 1 \le i \le n \land p \in Pos(t\_i)\} \end{aligned}$$

where λ is the empty word and "." in i.p is the *concatenation* operator. For p ∈ *Pos*(t), we write t|<sup>p</sup> for the subterm of t at position p, and t[s]<sup>p</sup> for the term t where the subterm at position p has been replaced by s. We write s t if t is a subterm of s and s t if it is a subterm and s = t. If L⊆T (F), we write L for the language L and all its subterms. A *substitution* σ is an application of X → T (F, X ), mapping variables to terms. We tacitly extend it to the endomorphism σ : T (F, X ) → T (F, X ) where tσ is the result of the application of the term t to the substitution σ.

**Term Rewriting Systems.** [1] provide a flexible way of defining functional programs and their semantics. A rewriting system is a pair F, R, where F is an alphabet and R a set of rewriting rules of the form l → r, where l, r ∈ T (F, X ), l ∈ X and *Var* (r) ⊆ *Var* (l). A TRS can be seen as a set of rules, each of them defining one step of computation. We write R a rewriting system F, R if there is no ambiguity on F. A rewriting rule l → r is said to be left-linear if the term l is linear. Example 1 shows a TRS representing a functional program, where each rule is left-linear. In that case we say that the TRS R is left-linear.

A rewriting system R induces a rewriting relation →<sup>R</sup> where for alls s, t ∈ T (F, X ), s →<sup>R</sup> t if it exists a rule l → r ∈ R, a position p ∈ *Pos*(s) and a substitution σ such that lσ = s|<sup>p</sup> and t = s[rσ]p. The reflexive-transitive closure of →<sup>R</sup> is written →<sup>∗</sup> <sup>R</sup>. The rewriting system introduced in the previous example also derives a rewriting relation →<sup>R</sup> where

$$\left(\left(\left(\left(filter, odd\right), cons(0, cons(s(0), nil)\right)\right) \to\_{\mathbb{R}}^{\*} cons(s(0), nil)\right)$$

The term cons(s(0), nil) is *irreducible* (no rule applies to it) and hence the result of the function call. We write *IRR*(R) for the set of irreducible terms of R.

**Tree Automata.** [12] are a convenient way to represent regular sets of terms. A tree automaton is a quadruple F, Q, Q<sup>f</sup> , Δ where F is an alphabet, Q a finite set of states, Q<sup>f</sup> the set of *final states*, and Δ a rewriting system on F∪Q. Rules in Δ, called *transitions*, are of the form l → q where q ∈ Q and l is either a state (∈ Q), or a *configuration* of the form f(q1,...,qn) with f ∈ F, q<sup>1</sup> ...q<sup>n</sup> ∈ Q. A term t is *recognised* by a state q ∈ Q if t →<sup>∗</sup> <sup>Δ</sup> q, which we also write t →<sup>∗</sup> <sup>A</sup> <sup>q</sup>. We write L(A, q) for the language of all terms recognised by q. A term t is recognised by A if there exists q ∈ Q<sup>f</sup> s.t. t ∈ L(A, q). In that case we write t ∈ L(A). *E.g.*, the tree automaton A = F, Q, Q<sup>f</sup> , Δ with F = {0:0, s : 1}, Q<sup>f</sup> = {qpair} and Δ = {0 → qpair, s(qodd) → qpair, s(qpair) → qodd, nil → qlist, cons(qpair, qlist) → qlist} recognises all lists of even natural numbers.

An -*transition* is a transition q → q where q ∈ Q. A tree automaton A is -*free* if it contains no -transitions. A is *deterministic* if for all terms t there is at most one state q such that t →<sup>∗</sup> <sup>Δ</sup> q. A is *reduced* if for all q there is at least one term t such that t →<sup>∗</sup> <sup>Δ</sup> q.

#### **2.2 Tree Automata Completion Algorithm**

The verification algorithm is based on **tree automata completion**. Given a program represented as a rewriting system R, and its input represented as a tree automaton <sup>A</sup><sup>0</sup>, the *tree automata completion algorithm* computes a new tree automaton A<sup>∗</sup> recognising the set of all *reachable terms* starting from a term in L(A). For a given R, we write this set R<sup>∗</sup>(L(A)) = {t | ∃s ∈ L(A), s →<sup>∗</sup> <sup>R</sup> <sup>t</sup>}. It includes all intermediate computations and, in particular, the *output* of the functional program. The algorithm proceeds by computing iteratively <sup>A</sup><sup>1</sup>, <sup>A</sup><sup>2</sup>,... such that <sup>A</sup><sup>i</sup>+1 <sup>=</sup> <sup>C</sup>R(A<sup>i</sup> ) until it reaches a fix-point, <sup>A</sup><sup>∗</sup>. Here, <sup>C</sup>R(A<sup>i</sup> ) represents *one step* of completion and is performed by searching and *completing* the *critical pairs* of <sup>A</sup><sup>i</sup> .

$$\begin{array}{ccc} l\sigma \xrightarrow{l\sigma} r\sigma & & l\sigma \xrightarrow{l\sigma} r\sigma\\ \mathcal{A}^{i} & & \Rightarrow & \mathcal{A}^{i+1} \\ q & & & q \end{array} \begin{array}{ccc} \begin{array}{c} l\sigma \xrightarrow{\mathcal{R}} r\sigma\\ \ast\\ \mathcal{A}^{i+1} \\ q \end{array} \end{array}$$

**Definition 1 (Critical pair).** *A critical pair is a triple* <sup>l</sup> <sup>→</sup> r, σ, q *where* l → r ∈ R*,* σ *is a substitution, and* q ∈ Q *such that* lσ →<sup>∗</sup> <sup>A</sup>*<sup>i</sup>* <sup>q</sup> *and* rσ →<sup>∗</sup> <sup>A</sup>*<sup>i</sup>* <sup>q</sup>*.*

Completing a critical pair consists in adding the necessary transitions in <sup>A</sup><sup>i</sup>+1 to have rσ →<sup>∗</sup> <sup>A</sup>*i*+1 <sup>q</sup>, and hence rσ <sup>∈</sup> <sup>L</sup>(A<sup>i</sup>+1, q).

*Example 2.* Let <sup>A</sup><sup>0</sup> be the previously defined tree automaton recognising all lists of even natural numbers. Let <sup>R</sup> <sup>=</sup> {s(s(x)) <sup>→</sup> <sup>s</sup>(x)}. <sup>A</sup><sup>0</sup> has a critical pair s(s(x)) → s(x), σ, qpair with σ(x) = qpair. To *complete* the automaton, we need to add transition such that s(qpair) →<sup>∗</sup> <sup>A</sup><sup>1</sup> <sup>q</sup>pair. Since we already have the state qodd recognising s(qpair), we only add the transition qodd → qpair. The formal definition of the completion step, including the procedure of choosing which new transition to introduce, can be found in [17].

Every completion step has the following property:

$$\begin{aligned} L(\mathcal{A}^i) &\subseteq L(\mathcal{A}^{i+1}) \quad \text{and} \\ s \in L(\mathcal{A}^i) &\implies s \to\_{\mathcal{R}} t \implies t \in L(\mathcal{A}^{i+1}) \end{aligned}$$

It implies that, if a fix-point A<sup>∗</sup> then it recognises every term of R<sup>∗</sup>(L(A)). However it is in general impossible to compute a tree automaton recognising R<sup>∗</sup>(L(A)) exactly, and this may cause the completion algorithm to diverge. Instead we shall over-approximate it by an automaton A<sup>∗</sup> such that L(A<sup>∗</sup>) ⊇ R∗(L(A)). The approximation is performed by introducing a set E of *equations* of the form l = r where l, r ∈ T (F, X ). From E we derive the relation =E, the *smallest congruence* such that for all equation l = r and substitution σ we have lσ <sup>=</sup><sup>E</sup> rσ. In this paper we also write E for the TRS {<sup>l</sup> <sup>→</sup> <sup>r</sup> <sup>|</sup> <sup>l</sup> <sup>=</sup> <sup>r</sup> <sup>∈</sup> <sup>E</sup>}. At each completion step, the algorithm *simplifies* the automaton by merging states together according to E.

**Definition 2 (Simplification Relation).** *Let* <sup>A</sup> <sup>=</sup> <sup>F</sup>, <sup>Q</sup>, <sup>Q</sup><sup>f</sup> , Δ *be a tree automaton and* E *be a set of equations. If* s = t ∈ E,σ : X → Q, q, q ∈ Q *such that* sσ →<sup>∗</sup> <sup>A</sup> q, tσ <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>q</sup> *and* <sup>q</sup> = q *then* A *can be simplified into* A = A{q → q} *(where* q *has been substitued by* q*), denoted by* A <sup>E</sup> A *.*

We write S<sup>E</sup>(A) for the unique automaton (up to renaming) A such that A <sup>∗</sup> <sup>E</sup> A and A is irreducible by <sup>E</sup>. One completion step is now defined by <sup>A</sup><sup>i</sup>+1 <sup>=</sup> <sup>S</sup><sup>E</sup>(CR(A<sup>i</sup> )).

$$\mathcal{A}^i \left| \begin{array}{c} s\sigma \mathrel{\mathop{:}\!{}\_{E}} \\ \ast \\ \ast \\ q \end{array} \; \begin{array}{c} t\sigma \\ \\ \ast \\ q' \end{array} \; \Rightarrow \begin{array}{c} s\sigma \mathrel{\mathop{:}\!{}\_{E}} \\ \ast \\ q \end{array} \; \begin{array}{c} t\sigma \\ \\ \ast \\ \\ \ast \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \ast \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \\ \ast \\ \end{array} \; \begin{array}{c} t\sigma \\ \\ \\ \\ \\ \\ \ast \\ \end{array} \end{array}$$

*Example 3.* This example shows how using equations can lead to approximations in tree automata. Let A be the tree automaton defined by the set of transitions Δ = {0 → q0, s(q0) → q1}. This automaton recognises the two terms 0 in q<sup>0</sup> and s(0) (also known as 1) in q1. Let E = {s(x) = x} containing the equation that equates a number and its successor. For σ = {x → 0} we have s(x)σ →<sup>A</sup> q1, xσ →<sup>A</sup> q<sup>0</sup> and s(x)σ =<sup>E</sup> xσ. Then in S<sup>E</sup>(A), q<sup>0</sup> and q<sup>1</sup> are merged. The resulting automaton has transitions {<sup>0</sup> <sup>→</sup> <sup>q</sup>0, s(q0) <sup>→</sup> <sup>q</sup>0}, which recognises <sup>N</sup> in <sup>q</sup>0.

The idea behind the simplification is to overapproximate R<sup>∗</sup>(L(A)) when it is *not regular*. It has been shown in [19] that it is possible to tune the precision of the approximation. For a given TRS R, initial state automaton A and set of equations E, the termination of the completion algorithm is undecidable in general, even with the use of equations. Our contribution in this paper consists in finding a class of TRS/programs and equations E for which the completion algorithm with equations terminates.

### **3 Termination of Tree Automata Completion**

In this section, we show that termination of the completion algorithm with a set of equations <sup>E</sup> is ensured under the following conditions: if (i) <sup>A</sup><sup>k</sup> is reduced -free and deterministic (written **REFD** in the rest of the paper) for all k; (ii) every term of <sup>A</sup><sup>k</sup> can be rewritten into a term of a given language L ⊆ T (F) using R (for instance if R is terminating); (iii) L has a finite number of equivalence classes w.r.t E. Completion is known to preserve -reduceness and -determinism if E ⊇ E<sup>r</sup> ∪ E<sup>R</sup> [19] where E<sup>R</sup> = {s = t | s → t ∈ R} and <sup>E</sup><sup>r</sup> <sup>=</sup> {f(x1,...,xn) = <sup>f</sup>(x1,...,xn) <sup>|</sup> <sup>f</sup> ∈ Fn}. Condition (i) is ensured by showing that, in our verification setting, completion preserve REFD. The last condition is ensured by having <sup>E</sup> <sup>⊇</sup> <sup>E</sup><sup>c</sup> <sup>L</sup> where <sup>E</sup><sup>c</sup> <sup>L</sup> is a set of *contracting equations*.

**Definition 3 (Contracting Equations).** *Let* L⊆T (F)*. A set of equations is contracting for* <sup>L</sup>*, denoted by* <sup>E</sup><sup>c</sup> <sup>L</sup>*, if all equations of* <sup>E</sup><sup>c</sup> <sup>L</sup> *are of the form* <sup>u</sup> <sup>=</sup> <sup>u</sup>|<sup>p</sup> *with* u *a linear term of* T (F, X )*,* p = λ *and if the set of normal forms of* L *w.r.t the TRS* E<sup>c</sup> <sup>L</sup> <sup>=</sup> {<sup>u</sup> <sup>→</sup> <sup>u</sup>|<sup>p</sup> <sup>|</sup> <sup>u</sup> <sup>=</sup> <sup>u</sup>|<sup>p</sup> <sup>∈</sup> <sup>E</sup><sup>c</sup> <sup>L</sup>} *is finite.*

*Example 4.* Assume that <sup>F</sup> <sup>=</sup> {0:0, s : 1}. The set <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>=</sup> {s(x) = <sup>x</sup>} is contracting for L = T (F) because the set of normal forms of T (F) with respect to E<sup>c</sup> <sup>L</sup> <sup>=</sup> {s(x) <sup>→</sup> <sup>x</sup>} is the (finite) set {0}. The set <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>=</sup> {s(s(x)) = <sup>x</sup>} is contracting because the normal forms of {s(s(x)) → x} are {0, s(0)}.

The contracting equations ensure that the completion algorithm will merge enough states during the simplification steps to terminate. Note that E<sup>c</sup> <sup>L</sup> cannot be empty, unless L is finite. To prove termination of completion, we first prove that it is possible to bound the number of states needed in A<sup>∗</sup> to recognise a language <sup>L</sup> by the number of normal forms of <sup>L</sup> with respect to E<sup>c</sup> <sup>L</sup>. In our case L will be the set of output terms of the program. Since A<sup>∗</sup> does not only recognises the output terms, we need additional states to recognise intermediate computation terms. In the proof of Theorem 1 we show that with ER, the simplification steps will merge the states recognising the intermediate computation with the states recognising the outputs. If the latter set of states is finite then we can show that A<sup>∗</sup> is finite.

**Theorem 1.** *Let* <sup>A</sup> *be an REFD tree automaton,* <sup>R</sup> *a left-linear TRS,* <sup>E</sup> *a set of equations and* <sup>L</sup> *a language closed by subterms such that for all* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and for all* s ∈ L-(A<sup>k</sup>)*, there exists* <sup>t</sup> ∈ L *s.t.* <sup>s</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup>*. If* <sup>E</sup> <sup>⊇</sup> <sup>E</sup><sup>r</sup> <sup>∪</sup> <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>∪</sup> <sup>E</sup><sup>R</sup> *then the completion of* A *by* R *and* E *terminates with a REFD* A<sup>∗</sup>*.*

### **4 A Class of Analysable Programs**

The next step is to identify a class of functional programs and a language L for which Theorem 1 applies. By choosing L = T (F) and providing a set of contracting equations E<sup>c</sup> <sup>T</sup> (F), the termination theorem above proves that the completion algorithm terminates on any functional program R. If this works in theory, in practice we want to avoid introducing equations over the application symbol (such as @(x, y) = y). Contracting equations on applications makes sense in certain cases, *e.g.*, with idempotent functions (@(sort, @(sort, x)) = @(sort, x)), but in most cases, such equations dramatically lower the precision of the completion algorithm. Hence, we want to identify a language L with no contracting equations over @ in E<sup>c</sup> <sup>L</sup>. Since such a language <sup>L</sup> still has to have a finite number of normal forms w.r.t. E<sup>c</sup> <sup>L</sup> (Theorem 1), it cannot include terms containing an un-bounded *stack* of applications. For instance, L cannot contain all the terms of the form @(f,x), @(f, @(f,x)), @(f, @(f, @(f,x))), etc. The @ stack must be bounded, even if the applications symbols are interleaved with other symbols (e.g. @(f,s(@(f,s(@(f,s(x))))))). To do that we (i) define a set <sup>B</sup><sup>d</sup> of all terms where such stack size is bounded by <sup>d</sup> <sup>∈</sup> <sup>N</sup>; (ii) define a set <sup>K</sup><sup>n</sup> and a class of TRS called <sup>K</sup>-TRS such that for any TRS <sup>R</sup> in this class, <sup>K</sup><sup>n</sup> is closed by <sup>R</sup> and <sup>K</sup><sup>n</sup> <sup>∩</sup>*IRR*(R) ⊆ Bφ(n) for some function <sup>φ</sup>. This is done by first introducing a type system over the terms; (iii) finally define <sup>L</sup> <sup>=</sup> <sup>B</sup>φ(n)∩*IRR*(R) that can be used to instantiate Theorem 1.

**Definition 4.** *For a given alphabet* <sup>F</sup> <sup>=</sup> C∪{@}*,* <sup>B</sup><sup>d</sup> *is the set of terms where every application depth is bounded by* d*. It is the smallest set defined by:*

$$\begin{aligned} f \in \mathcal{B}^0 &\Leftarrow f \in \mathcal{C}^0\\ f(t\_1, \ldots, t\_n) \in \mathcal{B}^i &\Leftarrow f \in \mathcal{C}^n \land t\_1 \ldots t\_n \in \mathcal{B}^i\\ \Psi(t\_1, t\_2) \in \mathcal{B}^{i+1} &\Leftarrow t\_1, t\_2 \in \mathcal{B}^i\\ t \in \mathcal{B}^{i+1} &\Leftarrow t \in \mathcal{B}^i \end{aligned}$$

In Sect. 5, we show how to produce <sup>E</sup><sup>c</sup> such that <sup>B</sup><sup>d</sup> <sup>∩</sup> *IRR*(R) has a finite number of normal forms w.r.t. E<sup>c</sup> with no equations on @. However we don't have for all k, for all term t ∈ L-(A<sup>k</sup>) a term <sup>s</sup> ∈ B<sup>d</sup> <sup>∩</sup> *IRR*(R) s.t. <sup>t</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>s</sup> in general. Theorem <sup>1</sup> cannot be instantiated with <sup>L</sup> <sup>=</sup> <sup>B</sup><sup>d</sup> <sup>∩</sup> *IRR*(R). Instead we define (i) a set <sup>K</sup><sup>n</sup> ⊆ T (F) and <sup>φ</sup> such that <sup>K</sup><sup>n</sup> <sup>∩</sup> *IRR*(R) ⊆ B<sup>φ</sup>(d) and (ii) a class of TRS, called K-TRS for which L-(A<sup>k</sup>) ⊆ K<sup>n</sup> -. In K-TRS, the right hand sides of TRS rules are contained in a set K whose purpose is to forbid the construction of unbounded partial applications during rewriting. If the initial automaton satisfies L-(A) ⊆ K<sup>n</sup> then we can instantiate Theorem 1 with <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>n</sup> -∩ *IRR*(R) and prove termination.

#### **4.1 Types**

In order to define <sup>K</sup> and <sup>K</sup><sup>n</sup> we require the TRS to be well-typed. Our definition of types is inspired by [1]. Let *A* be a non-empty set of *algebraic types*. The set of *types T* is inductively defined as the least set containing *A* and all function types, *i.e.* A → B ∈ *T* ⇐ A, B ∈ *T* . The function type constructor → is assumed to be right-associative. The *arity* of a type A is inductively defined on the structure of A by:

$$\begin{aligned} ar(A) &= 0 \\ ar(A \to B) &= 1 + ar(B) \\ \end{aligned} \qquad \begin{aligned} \Leftarrow A \in \mathcal{A} \\ \Leftarrow A \to B \in \mathcal{P} \end{aligned}$$

Instead of using alphabets, in a typed terms environment we use *signatures* F = C∪{@} where C is a set of *constructor* symbols associated to a unique type and @ the application symbol (with no type). We also assign a type to every variable. We write f : A if the symbol f has type A and t : A a term t ∈ T (F, X ) of type A. We write W(F, X ) for the set of all *well typed terms* using the usual definition. We extend the definition of term rewriting systems to typed TRS. A TRS is well typed if all rules are of the form l : A → r : A (type is preserved). In the same way, an equation s = t is well typed if both s and t have the same type. In the rest of this paper we only consider well typed equations and TRSs.

**Definition 5 (Functional TRS).** *A higher-order functional TRS is composed of rules of the form*

$$\left(\mathbb{Q}(\dots \mathbb{Q}(f, t\_1:A\_1)\dots, t\_n:A\_n):A \to r:A\right)$$

*where* <sup>f</sup> : <sup>A</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>A</sup> ∈ C<sup>n</sup>*,* <sup>t</sup><sup>1</sup> ...t<sup>n</sup> ∈ W(C, <sup>X</sup> ) *and* <sup>r</sup> ∈ W(F, <sup>X</sup> )*. A functional TRS is* complete *if for all term* t = @(t1, t2) : A *such that* ar(A)=0*, it is possible to rewrite* t *using* R*. In other words, all defined functions are total.*

Types provides information about how a term can be rewritten. For instance we expect the term @(f : A → B, x : A) : B to be rewritten by every *complete* (no partial function) TRS R if ar(A → B) = 1. Furthermore, for certain types, we can guarantee the absence of partial applications in the result of a computation using the type's *order*. For a given signature F, the *order* of a type A, written ord(A), is inductively defined on the structure of A by:

$$\begin{aligned} ord(A) &= \max\{ord(f) \mid f: \dots \to A \in \mathcal{C}^n\} \\ ord(A \to B) &= \max\{ord(A) + 1, ord(B)\} \end{aligned}$$

where ord(f : A<sup>1</sup> → ... → A<sup>n</sup> → A) = max{ord(A1), . . . , ord(An)} (with, for A<sup>i</sup> = A, ord(Ai) = 0). For instance ord(int) = 0 and ord(int → int) = 1.

*Example 5.* Define two different types of lists list and list . The first defines lists of int with the constructor consA : int → list → list ∈ C, while the second defines lists of functions with the constructor consB : (int → int) → list → list ∈ C. The importance of order becomes manifest here: in the first case a fully reduced term of type list cannot contain any @ whereas in the second case it can. ord(list) = 0 and ord(list ) = 1.

**Lemma 1.** *If* <sup>R</sup> *is a complete functional TRS and* <sup>A</sup> *a type such that* ord(A) = 0*, then all* closed *terms* t *of type* A *are rewritten into an irreducible term with no partial application:*

$$\forall s \in IRR(\mathcal{R}), \quad t \to\_{\mathcal{R}}^{\*} s \Rightarrow s \in \mathcal{B}^{0}.$$

### **4.2 The Class** *<sup>K</sup>***-TRS**

Recall that we want to define (i) a set <sup>K</sup><sup>n</sup> ⊆ T (F) and <sup>φ</sup> such that <sup>K</sup><sup>n</sup> - ∩ *IRR*(R) ⊆ B<sup>φ</sup>(n) and (ii) a class of TRS <sup>K</sup>-TRS for which <sup>L</sup>-(A<sup>k</sup>) ⊆ K<sup>n</sup> -. Assuming that L-(A) ⊆ K<sup>n</sup> we instantiate Theorem <sup>1</sup> with <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>n</sup> - ∩ *IRR*(R) and prove termination.

**Definition 6 (**K**-TRS).** *A TRS* <sup>R</sup> *is part of* <sup>K</sup>*-TRS if for all rules* <sup>l</sup> <sup>→</sup> <sup>r</sup> ∈ R*,* r ∈ K *where* K *is inductively defined by:*

$$\underline{x}: A \in \mathcal{K} \Leftarrow \underline{x}: A \in \mathcal{K}$$

$$f(t\_1, \ldots, t\_n): A \in \mathcal{K} \Leftarrow f \in \mathcal{C}^n \land t\_1, \ldots, t\_n \in \mathcal{K}$$

$$f(t\_1, \ldots, t\_n): B \land t\_1 \ldots t\_n \qquad t\_1 \ldots t\_n \in \mathcal{K} \land \ldots \land t\_n \ldots t\_n \qquad (1)$$

$$\{0(t\_1:A\to B,t\_2:A):B\in\mathcal{K}\leftrightarrow t\_1\in\mathcal{Z}, t\_2\in\mathcal{K}\land B\in\mathcal{J}\tag{1}$$

$$\Diamond(t\_1: A \to B, t\_2: A) : B \in \mathcal{K} \Leftarrow \begin{aligned} &t\_1, t\_2 \in \mathcal{K} \land \text{ord}(A) = 0 \end{aligned} \tag{2}$$

*with* Z *defined by:*

$$\begin{aligned} t \in \mathcal{Z} &\Leftarrow t \in \mathcal{K} \\ \Psi(t\_1, t\_2) \in \mathcal{Z} &\Leftarrow t\_1 \in \mathcal{Z}, t\_2 \in \mathcal{K} \end{aligned}$$

By constraining the form of the right hand side of each rule of R, K defines a set of TRS that cannot construct unbounded partial applications during rewriting. The definition of K takes advantage of the type structure and Lemma 1. The rules (1) and (2) ensure that an application @(t1, t2) is either: (1) a total application, and the whole term can be rewritten; or (2) a partial application where <sup>t</sup><sup>2</sup> can be rewritten into a term of <sup>B</sup><sup>0</sup> (Lemma 1). In (1), <sup>Z</sup> allows partial applications inside the total application of a multi-parameter function.

*Example 6.* Consider the classical map function. A typical call to this function is @(@(map, f), l) of type list, where f is a mapping function, and l a list. The whole term belongs to K because of rule (1): list is an algebraic type and its subterm @(map, f) : list → list belongs to Z. This subterm is a partial application, but there is no risk of stacking partial applications as it is part of a complete call (to the map function).

*Example 7.* Consider the function stack defined by:

$$\begin{aligned} &@(@(stack,\underline{x}),0) \to \underline{x} \\ &@(@(stack,\underline{x}),S(\underline{n})) \to @(@(stack,\otimes(g,\underline{x})),\underline{n}) \end{aligned}$$

Here g is a function of type (A → A) → A → A. The stack function returns a stack of partial applications whose height is equal to the input parameter:

$$\left\{ \left( \left\{ \left( \left( \left( \left( \left( \left( \left( \left( \left( \left( \left( \ldots \right) \right) \right) \right) \right) \right) \right) \right) \right\} \right\} \right\} \right. \\ \left( \left( \left( g, \left( \left( \left( g \right) \ldots \left( \left( \left( \left( \ldots \right) \right) \right) \right) \right) \right) \right) \right) \right) \right) \right\} \right. \right. \right. \right. \right. \left. \left( \left( \left( \left( \left( \left( \left( \ldots \left( \left( \left( \left( \ldots \right) \right) \right) \right) \right) \right) \right) \right) \right) \right) \right. \right. \right. \right. \right) \right. \right\} \right. \right\} $$

The depth of partial applications stacks in the output language is not bounded. With no equations on the @ symbol, the completion algorithm may not terminate. Notice that x is a function and @(g, x) a partial application. Hence the term @(@(stack, @(g, x)), n) is not in K, so the TRS does not belong to the K-TRS class.

We define <sup>K</sup><sup>n</sup> as {tσ <sup>|</sup> <sup>t</sup> ∈ K, σ : <sup>X</sup> → B<sup>n</sup> <sup>∩</sup> *IRR*(R)} and claim that if for all rule <sup>l</sup> <sup>→</sup> <sup>r</sup> of the functional TRS <sup>R</sup>, <sup>r</sup> ∈ K and if <sup>L</sup>(A) ⊆ K<sup>n</sup> then with Theorem 1 we can prove that the completion of A with R terminates. The idea is the following:


**Theorem 2.** *Let* <sup>A</sup> *be a* <sup>K</sup><sup>n</sup>*-coherent REFD tree automaton,* <sup>R</sup> *a terminating functional TRS such that for all rule* l → r ∈ R, r ∈ K *and* E *a set of equations. Let* <sup>L</sup> <sup>=</sup> <sup>B</sup><sup>n</sup>+2<sup>B</sup> <sup>∩</sup> *IRR*(R)*. If* <sup>E</sup> <sup>=</sup> <sup>E</sup><sup>r</sup> <sup>∪</sup> <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>∪</sup> <sup>E</sup><sup>R</sup> *then the completion of* <sup>A</sup> *by* <sup>R</sup> *and* E *terminates.*

To prove that after each step of completion, the recognised language stays in <sup>K</sup><sup>n</sup>, we require the considered automaton to be <sup>K</sup><sup>n</sup>-*coherent*.

**Definition 7 (**K<sup>n</sup>*-***coherence).** *Let* L⊆W(F) *and* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.* <sup>L</sup> *is* <sup>K</sup><sup>n</sup>*-coherent if*

$$
\mathcal{L} \subseteq \mathcal{K}^n \; \vee \; \mathcal{L} \subseteq \mathcal{Z}^n \; \backslash \; \mathcal{K}^n
$$

*By extension we say that a tree-automaton* <sup>A</sup> <sup>=</sup> <sup>F</sup>, <sup>Q</sup>, <sup>Q</sup><sup>f</sup> , Δ *is* <sup>K</sup><sup>n</sup>*-coherent if the language recognised by all states* <sup>q</sup> ∈ Q *is* <sup>K</sup><sup>n</sup>*-coherent.*

If <sup>K</sup><sup>n</sup>-coherence is not preserved during completion, then some states in the completed automaton may recognise terms outside of <sup>K</sup><sup>n</sup> -. Our goal is to show that it is preserved by CR(·) (Lemma 2) then by S<sup>E</sup>(·) (Lemma 3).

**Lemma 2 (**CR(A) **preserves** <sup>K</sup><sup>n</sup>*-***coherence).** *Let* <sup>A</sup> *be a REFD tree automaton. If* <sup>A</sup> *is* <sup>K</sup><sup>n</sup>*-coherent, then* <sup>C</sup>R(A) *is* <sup>K</sup><sup>n</sup>*-coherent.*

**Lemma 3 (**S<sup>E</sup>(A) **preserves** <sup>K</sup><sup>n</sup>*-***coherence).** *Let* <sup>A</sup> *be a REFD tree automaton,* <sup>R</sup> *a functional TRS and* <sup>E</sup> *a set of equations such that* <sup>E</sup> <sup>=</sup> <sup>E</sup><sup>r</sup> <sup>∪</sup> <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>∪</sup> <sup>E</sup><sup>R</sup> *with* <sup>L</sup> <sup>=</sup> <sup>B</sup><sup>n</sup>+2<sup>B</sup> <sup>∩</sup> *IRR*(R)*. If* <sup>A</sup> *is* <sup>K</sup><sup>n</sup>*-coherent then* <sup>S</sup><sup>E</sup>(A) *is* <sup>K</sup><sup>n</sup>*-coherent.*

By using Lemmas 2 and 3, we can prove that the completion algorithm, which is a composition of <sup>C</sup>R(A) and <sup>S</sup><sup>E</sup>(A), preserves <sup>K</sup><sup>n</sup>-coherence. The proofs of these two lemmas are based on a detailed analysis of the completion algorithm itself. The complete proofs are provided in [17].

**Lemma 4 (Completion preserves** <sup>K</sup><sup>n</sup>*-***coherence).** *Let* <sup>A</sup> <sup>=</sup> <sup>F</sup>, <sup>Q</sup>, <sup>Q</sup><sup>f</sup> , Δ *be a tree automaton,* R *a functional TRS and* E *a set of equations. If* E = <sup>E</sup><sup>r</sup> <sup>∪</sup> <sup>E</sup><sup>c</sup> <sup>L</sup> <sup>∪</sup> <sup>E</sup><sup>R</sup> *with* <sup>L</sup> <sup>=</sup> <sup>B</sup><sup>n</sup>+2<sup>B</sup> <sup>∩</sup> *IRR*(R) *and* <sup>A</sup> *is* <sup>K</sup><sup>n</sup>*-coherent then for all* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>A</sup><sup>k</sup> *is* <sup>K</sup><sup>n</sup>*-coherent. In particular,* <sup>A</sup><sup>∗</sup> *is* <sup>K</sup><sup>n</sup>*-coherent.*

By construction we can prove that the depth of irreducible <sup>K</sup><sup>n</sup> terms is bounded, which correspond to the following lemma.

**Lemma 5.** *For all* <sup>t</sup> : <sup>T</sup> ∈ K<sup>n</sup> -*,* <sup>t</sup> : <sup>T</sup> <sup>∈</sup> *IRR*(R) <sup>⇒</sup> <sup>t</sup> : <sup>T</sup> ∈ B<sup>n</sup>+2B−arity(T) *.*

#### **4.3 Proof of Theorem 2**

*Proof.* According to Lemma 4, for all <sup>k</sup> <sup>∈</sup> <sup>N</sup>, the completed automaton <sup>A</sup><sup>k</sup> is <sup>K</sup>n-coherent. By definition this implies that <sup>L</sup>-(Ak) ⊆ K<sup>n</sup> -. Moreover, we know that *IRR*(R) ∩ K<sup>n</sup> - ⊆ Bn+2<sup>B</sup> (Lemma 5). Let <sup>L</sup> <sup>=</sup> <sup>B</sup>n+2<sup>B</sup> <sup>∩</sup> *IRR*(R). <sup>R</sup> is terminating, so for every term s ∈ L-(Ak) there exists <sup>t</sup> ∈ L such that <sup>s</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup>. Since the number of normal form of <sup>L</sup> is finite w.r.t E , Theorem <sup>1</sup> implies that the completion of A by R and E terminates.

### **5 Equation Generation**

Theorem 2 states a number of hypotheses that must be satisfied in order to guarantee termination of the completion algorithm:


In this section, we describe a method for generating all possible sets of contracting equations E<sup>c</sup> <sup>L</sup>. To simplify the presentation, we only present the case where L = W(C) and *IRR*(R) ⊆ W(C) (*i.e.*, all results are first-order terms). Our approach looks for contracting equations for the set of closed terms W(C) instead of the set <sup>B</sup><sup>n</sup>+2<sup>B</sup> mentioned in Theorem 2. More precisely, we generate the set of equations iteratively, as a series of equation sets E<sup>k</sup> <sup>c</sup> where the equations only equate terms of depth at most k. Recall that a contracting equation is of the form u = u|<sup>p</sup> with p = λ, *i.e.*, it equates a term with a strict subterm of the same type. A set of contracting equations over the set W(C) is then generated as follows: (i) generate the set of left-hand side of equations as a *covering set of terms* [25], so that for each term t ∈ W(C) there exists a left-hand side u of an equation and a substitution σ such that t = uσ. (ii) for each left-hand side, generate all possible equations of the form u = u|<sup>p</sup>, satisfying that both sides have the same type. (iii) from all those equations, we build all possible E<sup>c</sup> <sup>L</sup> (with <sup>L</sup> <sup>=</sup> <sup>W</sup>(C)) such that the set of normal forms of <sup>W</sup>(C) w.r.t. E<sup>c</sup> <sup>L</sup> is finite. Since Ec <sup>L</sup> is left-linear and <sup>L</sup> <sup>=</sup> <sup>W</sup>(C), this can be decided efficiently [11].

*Example 8.* Assume that C = {0:0, s : 1}. For k = 1, the covering set is {s(x), 0} and E<sup>1</sup> <sup>c</sup> = {{s(x) = x}}. For depth 2, the covering set is {s(s(x)), s(0), 0} and E<sup>2</sup> <sup>c</sup> = E<sup>1</sup> <sup>c</sup> ∪ {{s(s(x)) = x}, {s(s(x)) = s(x)}, {s(0) = 0}, {s(0) = 0, s(s(x)) = <sup>x</sup>}, {s(0) = 0, s(s(x)) = <sup>s</sup>(x)}}. All equation sets of <sup>E</sup><sup>1</sup> <sup>c</sup> and E<sup>2</sup> <sup>c</sup> satisfy Definition 3 and lead to different approximations.

To verify a property ϕ on a program, we use completion and equation generation as follows. The program is represented by a TRS R and function calls are represented by an initial tree automaton A. Both have to respect the hypothesis of Theorem 2. The algorithm searches for a set of contracting equations E<sup>c</sup> such that verification succeeds, *i.e.* L(A∗) satisfy ϕ. Starting from k = 1, we apply the following algorithm:


If there exists a set of equations E<sup>c</sup> able to verify the program, this algorithm will find it eventually, or find a counter example. However if there is no set of equations that can verify the program, this algorithm does not terminate.

### **6 Experiments**

The verification technique described above has been integrated in the Timbuk library [16]. We implemented the naive equation generation where all possible equation sets E<sup>c</sup> are enumerated. Despite the evident scalability issues of this simple version of the verification algorithm, we have been able to verify a series of properties of several classical higher-order functions: *map*, *filter* , *exists*, *forall*, *foldRight*, *foldLeft* as well as higher-order sorting functions parameterised by an ordering function. Most examples are taken from or inspired by [26,28] and have corresponding TRSs in the K set defined above. The property ϕ consists in checking that a finite set of forbidden terms is not reachable (Patterns section of Timbuk specifications).

Given A, R and A<sup>∗</sup>, the *correctness of the verification*, i.e. the fact that L(A<sup>∗</sup>) ⊇ R<sup>∗</sup>(L(A)), can be checked in a proof assistant embedding a formalisation of rewriting and tree automata. It is enough to prove that (a) L(A<sup>∗</sup>) ⊇ L(A) and that (b) for all critical pairs l → r, σ, q of A<sup>∗</sup> we have rσ →<sup>∗</sup> <sup>A</sup><sup>∗</sup> <sup>q</sup>. Property (a) can be checked using standard algorithms on tree automata. Property (b) can be checked by enumerating all critical pairs of A<sup>∗</sup> (there are finitely many) and by proving that all of them satisfy rσ →<sup>∗</sup> <sup>A</sup><sup>∗</sup> <sup>q</sup>. Since there exists algorithms for checking properties (a) and (b), the complete proof of correctness can automatically be built in the proof assistant. For instance, the automaton A<sup>∗</sup> can be used as a certificate to build the correctness proof in Coq [6] and in Isabelle/HOL [14]. It is also used to build unreachability proofs in Isabelle/HOL [14]. Besides, since verifying (a) and (b) is automatic, the correctness proof may be run outside of the proof assistant (in a more efficient way) using a formally verified external checker extracted from the formalisation. All our (successful) completion attempts output a comp.res file, containing A, R and A∗, which has been certified automatically using the external certified checker of [6]. Timbuk's site http://people.irisa.fr/Thomas.Genet/timbuk/ funExperiments/ lists those verification experiments. Nine of them are automatically proven. Two other examples show that correct counter-examples are generated when the property is not provable. On one example equation generation times out due to our na¨ıve enumeration of equations. For this last case, by providing the right set of equations in mapTree2NoGen the verification of the function succeeds.

### **7 Related Work**

When it comes to verifying first-order imperative programs, there exist several successful tools based on abstract interpretation such as ASTREE [3] and SLAM [2]. The use of abstract interpretation for verifying higher-order functional programs has comparatively received less attention. The tree automaton completion technique is one analysis technique able to verify first-order Java programs [4]. Until now, the completion algorithm was guaranteed to terminate only in the case of first-order functional programs [19].

Liquid Types [31], followed by Bounded Refinement Types [33,34], and also Set-Theoretic Types [8,9], are all attempts to enrich the type system of functional languages to prove non-trivial properties on higher-order programs. However, these methods are not automatic. The user has to express the property he wants to prove using the type system, which can be tedious and/or difficult. In some cases, the user even has to specify straightforward intermediate lemmas to help the type checker.

The first attempt in verifying regular properties came with Jones [21] and Jones and Andersen [22]. Their technique computes a grammar overapproximating the set of states reachable by a rewriting systems. However, their approximation is fixed and too rough to prove programs like Example 1 (*filter* odd). Our program and property models are close to those of Jones and Andersen. However, the approximation in our analysis is not fixed and can be automatically adapted to the verification objective.

Ong *et al.* proposes one way of addressing the precision issue of Jones and Andersen's approach using a model checking technique on Pattern Matching Recursion Schemes [28] (PMRS). This technique improves the precision but is still not able to verify functions such as Example 1 (see [32] page 85). As shown in our experiments, our technique handles this example.

Kobayashi *et al.* developed a tree automata-based technique [26] (but not relying on TRS and completion), able to verify regular properties (including safety properties on Example 1). We have verified a selection of examples coming from [26] and observed that we can verify the same regular properties as they can. Our prototype implementation is inferior in terms of execution time, due to the slow generation of equations. A strength of our approach is that our verification results are certifiable and that they can be used as certificates to build unreachability proofs in proof assistants (see Sect. 6).

Our verification framework is based on regular abstractions and uses a simple abstraction mechanism based on equations. Regular abstractions are less expressive than Higher-Order Recursion Schemes [23,29] or Collapsible Pushdown Automata [7], and equation-based abstractions are a particular case of predicate abstraction [24]. However, the two restrictions imposed in this particular framework result in two strong benefits. First, the precision of the approximation is formally defined and precisely controlled using equations: L(A<sup>∗</sup>) ⊆ (R/E)∗(L(A)) [20]. This precision property permits us to prove intricate properties with simple (regular) abstractions. Second, using tree automatabased models facilitates the certification of the verification results in a proof assistant. This significantly increases the confidence in the verification result compared *e.g.*, to verdicts obtained by complex CEGAR-based model-checkers.

### **8 Conclusion and Future Work**

This paper shows that tree automata completion is a simple yet powerful, fully automatic verification technique for higher-order functional programs, expressed as term rewriting systems. We have proved that the completion algorithm terminates on a subset of TRS encompassing common functional programs, and provided experimental evidence of the viability of the approach by verifying properties on fundamental higher-order functions including filtering and sorting.

One remaining question is whether this approach is complete: if there exists a regular approximation of the reachable terms of a functional program, can we build it using equations? We can already answered this question in the positive when L = W(C), i.e., all results are first order terms [15]. Extending this result to all kind of results, including higher-order ones, is a promising research topic.

The generation of the approximating equations is automatic but simpleminded, and too simple to turn the prototype into a full verification tool. Further work will look into how sets of contracting equations can be generated in a more efficient manner, notably by taking the structure of the TRS into account and using a CEGAR approach.

The present verification technique is agnostic to the evaluation strategy. An interesting research track would be to experiment completion-based verification techniques with different term rewriting semantics of functional programs such as outlined by Clemente *et al.* [10]. This would permit us to take a particular evaluation strategy into account, and in certain cases, improve the precision of the verification. We already experimented with this in [18]. This is in line with our long-term research goal of providing a light-weight verification tool to assist the working OCaml programmer.

Our work focuses on verifying regular properties represented by tree automata. Dealing with non-regular over-approximations of reachable terms would allow us to verify relational properties like comparing the length of the list before and after f ilter. This is one of the objective of techniques like [24]. Building non-regular over-approximations of reachable terms for TRS, using a form of completion, is possible [5]. However, up to now, adapting automatically the precision of such approximations to a given verification goal is not possible. Extending their approach with equations may provide a powerful verification tool worth pursuing.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Author Index

Aceto, Luca 203 Achilleos, Antonis 203 Ahmed, Amal 146 Altenkirch, Thorsten 293 Baldan, Paolo 165 Bansal, Suguman 420 Bazille, Hugo 403 Bouyer, Patricia 530 Capriotti, Paolo 293 Castellan, Simon 3 Chaudhuri, Swarat 420 Clairambault, Pierre 3 Clouston, Ranald 258 D'Argenio, Pedro R. 384 Dardha, Ornela 91 Demri, Stéphane 476 Devesas Campos, Marco 71 Dijkstra, Gabe 293 Docherty, Simon 441 Edalat, Abbas 459 Fabre, Eric 403 Francalanza, Adrian 203 Gay, Simon J. 91 Genest, Blaise 403 Genet, Thomas 565 Gerhold, Marcus 384 Goncharov, Sergey 313 Hartmanns, Arnd 384 Haudebourg, Timothée 565 Hayman, Jonathan 3 Herbelin, Hugo 276 Ingólfsdóttir, Anna 203 Jaber, Guilhem 20 Jacq, Clément 39 Jensen, Thomas 565 Katsumata, Shin-ya 110 Kesner, Delia 241 Kraus, Nicolai 293

Lasota, Sławomir 548 Le Roux, Stéphane 367 Levy, Paul Blain 71 Liu, Xinxin 221 Lozes, Étienne 476

Maleki, Mehrdad 459 Mansutti, Alessio 476 Melliès, Paul-André 39 Miquey, Étienne 276

New, Max 146 Nordvall Forsberg, Fredrik 293

Padoan, Tommaso 165 Pérez, Guillermo A. 367 Piórkowski, Radosław 548 Pym, David 441

Rabusseau, Guillaume 513 Ríos, Alejandro 241 Rioux, Nick 146

Sabry, Amr 348 Santocanale, Luigi 494 Scherer, Gabriel 146 Schröder, Lutz 313 Sedwards, Sean 384 Sokolova, Ana 331

Toninho, Bernardo 128 Tzevelekos, Nikos 20

Valiron, Benoît 348 van Glabbeek, Rob 183 Vardi, Moshe Y. 420 Viso, Andrés 241 Vizzotto, Juliana Kaizer 348

Winskel, Glynn 3 Woracek, Harald 331

Yoshida, Nobuko 128 Yu, Tingting 221

Zhang, Wenhui 221